Amplifying Impact: Three Strategies to Enhance LLM Use Cases with Vector Databases

Prasun Mishra
4 min readJul 13, 2023

--

VectorDB + LLM

In a recent encounter, Tim spotted someone sporting a pair of exceptionally crafted dual-toned, metallic-finish leather shoes with colored laces. He was eager to inquire about the brand, but alas, the person had left post-meeting. Recognizing these shoes as something beyond the regular ‘off-the-rack’ varieties, Tim turned to his favored merchandise marketplace and inputted “metallic finish burgundy and black, formal leather shoes with dual-colored laces in size 10, for men”. To his delight, the exact match appeared as the third option. He promptly bought matching socks to complete the look.

This scenario exemplifies how semantic search assists industries such as e-commerce, retail, and marketplaces in guiding customers toward purchase — an endeavor that traditional keyword searches would struggle with. The unsung hero behind the successful semantic search? Vector databases, are known for their high-performance delivery in situations with extensive query volumes.

Unpacking Vector Databases: Vector databases treat vectors, or embeddings, as primary entities. They’ve earned their popularity due to their ability to store embeddings of unstructured data like images, text, video, audio, or event logs, simplifying the process of conducting semantic searches among them.

When aptly designed, vector databases coupled with Large Language Models (LLMs) can manage vast, high-dimensional data, paving the way for more refined, context-aware, and efficient natural language understanding applications.

Vector Databases and Large Language Models: A Powerful Duo: The synergy between vector databases and LLMs is potent. Let’s delve into the three most effective ways these databases augment LLM use cases and ensure a higher return on investment.

  1. As a knowledge base to provide ‘context’ from the enterprise. In this case, Vector DB acts as a knowledge extension for LLM’s and can be queried to retrieve existing similar information (context) from the knowledge base. This also eliminates the need to use sensitive enterprise data to train or fine-tune LLM. Every time a question is asked:

· Question gets converted to LLM-specific embedding.

· Embedding is used to retrieve relevant context (or documents) from the Vector database

· LLM Prompt is created with the help of this context.

· Response is generated. Enterprise-specific context helps LLM to provide accurate output.

o Use cases: Document discovery, Chatbots, Q&A.

o Key benefits: Avoids using sensitive data for model training/fine-tuning. Cheaper than fine-tuning LLMs. Almost real-time updated knowledge base.

  1. Acting as long-term LLM Memory. This helps to retrieve the last N messages relevant to the current message from the entire chat history which can encompass multiple simultaneous sessions and historical interactions. This also helps to bypass context length (tokens) limitations of LLM and gives more control in your hand. Here key steps are:

· User asks a query.

· System retrieves stored embedding from the vector database and pass on to query LLM

· LLM response is generated and shared with the Use. Also, response embedding (with history) is stored in a vector database.

o Use cases: Knowledge discovery, Chatbots.

o Key benefits: Bypass token length limitations of LLM and help with conversation topic changes.

  1. Cache previous LLM queries and responses. When a query is fired, create embedding and do a cache lookup before invoking the LLM. This ensures quick response and money saved on computation as well as LLM usage. Here key steps are:

· User asks a question.

· Embedding created and Cache lookup performed.

· If information is available in Cache, a response is provided.LLM not invoked.

· If information is unavailable in Cache, LLM is invoked, and the response is stored in the Cache.

o Use cases: All use cases such as Document discovery, Information retrieval, Chatbots, and Q&A.

o Key benefits: Speeds up performance, optimizes computational resources and LLM invocation cost.

This list doesn’t end here. Vector database work like a buddy with LLM and helps you to optimize security, cost, and performance across use cases. Depending on your business and specific use cases, solution need to be designed.

If you are considering investing in a Vector database or planning to use the feature available in an existing database like Radis, you should think and plan to address multiple system design concerns related to vector databases and LLM use cases, including but not limited to:

· Keyword vs Semantic search: Keyword search is good for finding results that match specific terms, while the semantic search is good for finding results that are relevant to the user’s intent. You may need a strategy to leverage the best of both, depending on your use cases.

· Creating embeddings with cost and time efficiency at scale,without paying too much for GPUs (vs CPU) but avoiding latency in the system.

· Strategy around Multimodal search which allows users to search for information using multiple modalities, such as text, images, audio, and permutation combinations.

· Also, think about whether your use case needs precise search results or explorative results.

· Whether to invest in a new vector database or use dense_vector in Elasticsearch, open search, or Solr?

· Integration with existing ML models and MLOps. How to ensure models will be performing best even at an increased scale? You may need to relook at the data pipeline and enable real-time streaming (Kafka/Kinesis/Flink) as real-time or near real-time predictions, fraud detection, recommendations, and search results would need them.

· There are business use case-driven issues to consider too: for example in a marketplace scenario, if a seller adds a new product, how does the system treat it with respect to search and recommendations?

· Many more…

With the technology still evolving, it’s beneficial to have experts that can guide you through system design while preventing redundant future costs.

#chatgpt #llm #llmops #mlops #vectordatabase #embeddings #openai #googlebardai #bardai #metaai #alpaca #stanford #huggingface #pinecone #redis #milvus #ltm #cohere #collaborativefiltering #recommendationsystems #serach #semanticsearch #frauddetection

(Note: This article was published on LinkedIn with slightly different content)

--

--

Prasun Mishra

Hands-on ML practitioner. AWS Certified ML Specialist. Kaggle expert. BIPOC DS Mentor. Working on an interesting NLP use cases!