Asterisk is a compact sentence embedding model designed to run inference on low-resource hardware — Raspberry Pi, older x86 machines, or anything without a GPU. Trained on a curated corpus with efficiency as the primary constraint, it produces dense vector representations suitable for semantic search, clustering, and RAG retrieval pipelines at a fraction of the cost of larger models.
I wanted to try using vector search for the feed-summarizer de-dupe/grouping logic in useful time, as well as build a vector index for my ArchiveBox and Obsidian data without making millions of calls to an external provider with all of my personal data on it.
And I wanted it _fast_ and able to run on one of the low-power ARM SBCs I have around. I ended up going down a rabbit hole that preceded go-gte. Partway during the training process on my potato RTX3060, I decided that Go would be a great choice for both portability and maintainability (even if ARM NEON is still much slower than Intel chips).
The model architecture is a distilled transformer encoder, trained with a contrastive objective on sentence pairs. It is not very smart or sophisticated (I chose it _precisely_ because I could understand it over lunch), but it works. I decided to use ONNX for portable, runtime-agnostic inference, and the repository includes training scripts, evaluation harnesses against standard STS benchmarks, and example inference code in Python via the ONNX Runtime.
Designed to run at useful speed on a 4-core ARM CPU — under 200 ms per sentence on a Pi 4 with the ONNX runtime.
Produces 256-dimensional vectors — small enough to store millions of embeddings in RAM on modest hardware.
Trained with a symmetric cross-entropy contrastive loss on sentence pairs — strong performance on semantic similarity despite the small size.
Weights ship as an ONNX model — drop into any language with an ONNX Runtime binding: Go, Rust, C, Python, JavaScript.
Evaluated against STS-B and SICK-R; results and training curves included in the repo for reproducibility.