image_meaning_db

A self-contained semantic image search tool. Upload images (optionally with a description) to build up a database, then search by image to find the nearest neighbors by meaning. Runs as a single Docker service: a FastAPI backend that embeds images locally with CLIP (clip-ViT-B-32) and stores vectors in ChromaDB, served behind a minimal browser UI.

On first launch it auto-seeds the database with ~100 sample images from Lorem Picsum so you have something to search against immediately.

Prerequisites

You need Docker Engine and the Docker Compose plugin. If you don't already have them:

Linux (Ubuntu/Debian): follow the official install guide at https://docs.docker.com/engine/install/ubuntu/. After installing, add your user to the docker group so you don't need sudo:
```
sudo usermod -aG docker $USER
newgrp docker
```
macOS / Windows: install Docker Desktop from https://docs.docker.com/desktop/. Compose is bundled.

Verify it works:

docker --version
docker compose version

Running it

From the project root:

docker compose up -d --build

(If your Compose is the older standalone binary, use docker-compose with a hyphen instead.)

Then open http://localhost:8081 in your browser.

What to expect on the first run

The first up --build is slow because it:

Installs Python deps including CPU-only PyTorch (~200 MB pip download).
Downloads the CLIP model weights (~600 MB) into a cached volume on first server start.
Fetches 100 seed images from picsum.photos and embeds them.

Watch progress with:

docker compose logs -f backend

You'll see Model clip-ViT-B-32 ready., then Seed: N images indexed... messages as the database fills. The UI is usable throughout — refresh to see the image count climb.

Subsequent runs reuse the cached model and the existing database, so startup is fast.

Using the UI

Two tabs:

Submit Image — drop, paste (Ctrl/Cmd+V), or click to select an image. Add an optional description (e.g. "red coffee mug on wooden desk") and click Submit to Database. The image is embedded and stored.
Search by Image — drop/paste/select a query image. The backend embeds it and returns the most semantically similar stored images, ranked by cosine similarity, with any descriptions they were submitted with.

API

If you want to hit the backend directly:

POST /api/submit — multipart form: file (image), optional description (string). Returns {id, filename, total_images}.
POST /api/search — multipart form: file (image), optional query param n (default 10). Returns ranked list of matches with similarity scores.
GET /api/images/{filename} — serves a stored image.
GET /api/stats — {total_images: N}.

Stopping and resetting

docker compose down             # stop containers, keep data
docker compose down -v          # also delete the database, cached model, and stored images

If you wipe volumes, the next start will re-download the CLIP model and re-seed the 100 sample images.

Configuration

Environment variables set in docker-compose.yml:

EMBEDDING_MODEL — sentence-transformers model name. Default: clip-ViT-B-32. If you change this, wipe the chroma_data volume — embedding dimensions must match across all stored vectors.
SEED_COUNT — number of sample images to seed on first launch. Default: 100. Set to 0 to skip seeding.

Host port mapping is also in docker-compose.yml; change the left side of "8081:8080" if 8081 conflicts with something else on your machine.

3.5 KiB Raw Blame History