asterisk-embedding-model

Overview

Asterisk is a compact sentence embedding model designed to run inference on low-resource hardware — Raspberry Pi, older x86 machines, or anything without a GPU. Trained on a curated corpus with efficiency as the primary constraint, it produces dense vector representations suitable for semantic search, clustering, and RAG retrieval pipelines at a fraction of the cost of larger models.

Motivation

I wanted to try using vector search for the feed-summarizer de-dupe/grouping logic in useful time, as well as build a vector index for my ArchiveBox and Obsidian data without making millions of calls to an external provider with all of my personal data on it.

And I wanted it _fast_ and able to run on one of the low-power ARM SBCs I have around. I ended up going down a rabbit hole that preceded go-gte. Partway during the training process on my potato RTX3060, I decided that Go would be a great choice for both portability and maintainability (even if ARM NEON is still much slower than Intel chips).

How it works

The model architecture is a distilled transformer encoder, trained with a contrastive objective on sentence pairs. It is not very smart or sophisticated (I chose it _precisely_ because I could understand it over lunch), but it works. I decided to use ONNX for portable, runtime-agnostic inference, and the repository includes training scripts, evaluation harnesses against standard STS benchmarks, and example inference code in Python via the ONNX Runtime.

Features

🥧

Raspberry Pi capable

Designed to run at useful speed on a 4-core ARM CPU — under 200 ms per sentence on a Pi 4 with the ONNX runtime.

📐

Compact embeddings

Produces 256-dimensional vectors — small enough to store millions of embeddings in RAM on modest hardware.

🎯

Contrastive training

Trained with a symmetric cross-entropy contrastive loss on sentence pairs — strong performance on semantic similarity despite the small size.

📦

ONNX export

Weights ship as an ONNX model — drop into any language with an ONNX Runtime binding: Go, Rust, C, Python, JavaScript.

🔬

Benchmark included

Evaluated against STS-B and SICK-R; results and training curves included in the repo for reproducibility.

Architecture