Fast experiment velocity for semantic search applications (i.e. recommendation, retrieval) is critical to realize the business impact, like higher return of investment and lower infra cost. The goal is straightforward, but it is hard as it needs E2E fundamental change. The E2E flow covers four critical components shown in Figure 1, including dataset management, embedding Management, vector search engine/DB and application AB test capability.
There can be many different experiment dimensions, like embedding and search algorithms.
(1) Embedding dimensions
Here is a list of experiment dimensions:
- Transformation methods. Take image retrieval as an example, there are many pretrained models to do the transformation.
- Dimension size. 4096, 1028, 126 or 64. Lower dimension means that we store less bytes, which can save cost significantly. But the dimension can impact search accuracy.
- Quantization to compress the original embedding. Same idea as using a lower dimension.
(2) Search dimensions
Here is a list of search dimensions:
- Similarity measure: Squared L2 (Euclidean distance), inner product or cosine similarity.
- Search algorithms. There are different type algorithms, like partition-based or graph base (HNSW). Different algorithms have different performance (index and search) and accuracy.
- Memory or SSD to store index/embedding in the engine. This is related to cost optimization.
In order to have faster experiment velocity, we need to have a flexible architecture to enable developers, data scientists and machine learning engineers to quickly experiment with different embedding and search settings. Vectorstore currently focuses on the vector search engine layer. The customers can create different indexes for different settings, and integrate with online AB experiments to see which experiment setting is best. This way, it can greatly improve the experiment velocity.