Models
Best Match 25 (BM25)
- Sometimes called Okapi BM25, after the Okapi IR system
- Adds two parameters:
- : adjust the balance between tf and idf
- : controls the important of document length normalization
- : the average document length in the collection
The BM25 score for a query and document is:
- When , BM25 reverts to no use of term frequency, just a binary selection of terms in the query (plus idf)
- A large results in raw term frequency (plus idf)
- ranges from 1 (scaling by document length) to 0 (no length scaling)
Hierarchical Navigable Small World (HNSW)
https://www.pinecone.io/learn/series/faiss/hnsw/
Probability skip lists
Navigable small world (NSW) graphs
FAISS
- Standalone vector indices