by Ralf Elsas
Overview
A toy example for RAG in Matlab, using a LLM with Ollama, and:
– Uses BM25 keyword-based scoring and vector embeddings from sentence-transformers.
– Retrieval results are merged using Reciprocal Rank Fusion (RRF).
Note that Ollama uses quantized versions of the LLMs to make them usable for CPU and GPU based on
llama.cpp.
———————
Build on the Mathworks blog by Sivylla Paraskeopoulou:
——————–
Note: You have to have
system_prompt = „You are a helpful assistant. You might get a “ + …
„context for each question, but only use the information “ + …
„in the context if that makes sense to answer the question. „;
chat = ollamaChat(„llama3.1“,system_prompt);
Simple question LLM
Here, no RAG is used It’s simply knowledge of one of the most recent LLM that can run even on CPU or small GPU, Llama3.1.
query_simple = „What is the most famous Verdi opera?“;
prompt_simple = „Answer the following question: „+ query_simple;
response_simple = generate(chat,prompt_simple);
wrapText(response_simple)
RAG question
query_tech = „How to import a PyTorch model into MATLAB?“;
prompt_tech = „Answer the following question: „+ query_tech;
response_tech = generate(chat,prompt_tech);
++++++ RAG ++++++++++++++
Download source document
url = „https://blogs.mathworks.com/deep-learning/2024/04/22/convert-deep-learning-models-between-pytorch-tensorflow-and-matlab/“;
% Define the local path where the post will be saved, download it using the provided URL, and save it to the specified local path.
if ~exist(localpath, ‚dir‘)
websave(localpath+filename,url);
Read the text from the downloaded file by first creating a FileDatastore object.
fds = fileDatastore(localpath,„FileExtensions“,„.html“,„ReadFcn“,@extractFileText);
% Define a function for text preprocessing.
function [tokDocs, strDocs] = preprocessDocuments(str)
paragraphs = splitParagraphs(join(str));
tokDocs = tokenizedDocument(paragraphs);
% Split the text data into paragraphs.
[tokDocs, strDocs] = preprocessDocuments(str);
BM25 Ranking
Tokenize the query and find similarity scores between the query and document.
bm25Docs = bm25Similarity(tokDocs,tokenizedDocument(query_tech));
[~, idxBM25] = sort(bm25Docs,„descend“);
display(strDocs(idxBM25(1:5)));
5×1 string array“Why Import Models into MATLAB?“
„Import models from TensorFlow and PyTorch into MATLAB “
„Trace the PyTorch model. For more information on how to trace a PyTorch model, go to Torch documentation: Tracing a function. Then, save the PyTorch model. “
„This example shows you how to import an image classification model from PyTorch. The PyTorch model must be pretrained and traced. “
„Import Models into MATLAB“
Sentence Embeddings
Get vector embeddings using a simple, small, but classic BERT-based sentence embedding model, directly implemented also in Matlab (sentence transformer „all-MiniLM-L6-v2), initially developed by Nils Reimers at TU Darmstadt (now Cohere).
emb = documentEmbedding(„Model“,„all-MiniLM-L6-v2“);
embeddedDocuments = embed(emb,strDocs);
queryEmb = embed(emb,query_tech);
embDocs = cosineSimilarity(embeddedDocuments,queryEmb);
[~, idxEMB] = sort(embDocs,„descend“);
display(strDocs(idxEMB(1:5)));
5×1 string array“Import models from TensorFlow and PyTorch into MATLAB “
„Export models from MATLAB to TensorFlow and PyTorch “
„What? Export models to TensorFlow. Export models to PyTorch. “
„Import Models into MATLAB“
„What? Import PyTorch models. Import TensorFlow models. Import PyTorch and Tensorflow models interactively. „
Reciprocal Rank Fusion
Determine RRF
This step of RAG (actually, retrieval) takes the two rankings and uses a classic formula to bring the two rankings together. Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set. It is a reranking algorithm that gives a reciprocal rank to documents in multiple sources, then combines those ranks and documents into one final reranked list. RRF requires no tuning, and the different relevance indicators do not have to be related to each other to achieve high-quality results.
The original article suggesting RRF is:
hashes = „hash_“+[1:length(strDocs)]‘;
bm25Ranking = table([hashes(idxBM25)],‚VariableNames‘,[„hash“]);
embRanking = table([hashes(idxEMB)],‚VariableNames‘,[„hash“]);
resTable = reciprocal_rank_fusion(bm25Ranking,embRanking);
idxRRF = find(ismember(resTable.hash,hashes));
display(strDocs(idxRRF(1:5)));
5×1 string array“In this blog post we are going to show you how to use the newest MATLAB functions to: “
„Import models from TensorFlow and PyTorch into MATLAB “
„Export models from MATLAB to TensorFlow and PyTorch “
„This is a brief blog post that points you to the right functions and other resources for converting deep learning models between MATLAB, PyTorch®, and TensorFlow™. Two good resources to get started with are the documentation topics Interoperability Between Deep Learning Toolbox, TensorFlow, PyTorch, and ONNX and Tips on Importing Models from TensorFlow, PyTorch, and ONNX. “
„If you have any questions about the functionality presented in this blog post or want to share the exciting projects for which you are using model conversion, comment below. „
Prepare best docs for LLM (retrieval)
Sort the documents in descending order of similarity scores.
% Iterate over the sorted document indices until the word limit is reached.
while totalWords <= limitWords && i <= length(idxRRF)
totalWords = totalWords + size(tokDocs(idxRRF(i)).tokenDetails,1);
selectedDocs = [selectedDocs; strDocs(idxRRF(i))];
RAG answer
Define the prompt for the chatbot with added technical context, and generate a response.
prompt_rag = „Context:“ + join(selectedDocs, “ „) …
+ newline +„Answer the following question: „+ query_tech;
response_rag = generate(chat, prompt_rag);
Side-note: Sometimes, no RAG is the answer
response_direct = generate(chat, query_tech+“ Answer this question based on the following input“+newline+„###:“+newline+str);
wrapText(response_direct)
Extensions
As mentioned before, this is at best toy example as one could have fed the whole input document (the blog on the Pytorch import) directly into the LLM. The power of RAG comes with huge sets of documents that need to be searched and then condensed to a concise answer using the LLM. For such an application, one needs the following extensions:
- Database: Documents (or documents stored as chunks of text) should be stored in an database that supports BM25 and vector search. One simple solution is to run OpenSearch in a Docker container. A primer on the installation and usage can be found in my old blog post.
- Framework: Due to its most recent functionalities, Matlab offers all core elements to set-up RAG, but more advanced chunking techniques, GraphRAG, and pdf-parsing etc. will require self-programming efforts and potentially usage of Python tools from Matlab. Note that nowadays there are many NLP search databases and frameworks out there to set-up a full RAG pipeline, These use, however, mostly Python. As personal, very subjective recommendations take a look at Vespa for example – one of the most powerful and flexible providers with SOTA implementations. Txtai, Haystack, LlamaIndex are also very nice framework implementations.
- Parsing: Despite all fancy tools for RAG, the most important one is loading your text data and information to be searched with high quality. If you’ve ever tried parsing a pdf of a scientific paper, an analyst report etc., you will know the immense difficulties of getting this seemingly trivial problem right, in particular whenever one has to deal with text, tables, and graphs in a document. There has been a lot of progress recently, however. Tools like Marker, LlamaParse and Nougat are promising, in particular because (at least the first two mentioned) are getting maintained and improved systematically,
- Embeddings: Altough BM25 is an old technique, it is still a very powerful benchmark for all retrieval purposes. Combined (!) with vector search based on embeddings by language models like BERT, LLama, etc., retrieval works almost as good as Google. Ever since the introduction of sentence transformers in a seminal paper by Nils Reimers and Irina Gurevych in 2019, there have been big advances in retrieval methods with embeddings, see for example HuggingFace’s text embedding benchmark, or Colbert and its multimodal extensions like ColPali. Using these methods can be done by using Matlab /Python interaction.
++++++++++ Helper functions
% Define a function that wraps text, which you can use to make the generated text easier to read.
function wrappedText = wrapText(text)
wrappedText = splitSentences(text);
wrappedText = join(wrappedText,newline);
function resultTable = reciprocal_rank_fusion(bm25Ranking, cosineRanking, k)
% Reciprocal Rank Fusion (RRF) with normalization and handling of different list sizes
% Outputs a single table with hashes and normalized scores based on the
% bm25Ranking – A table with column: ‚hash‘ for BM25 rankings
% cosineRanking – A table with column: ‚hash‘ for Cosine Similarity rankings
% k – The constant parameter for RRF, default to 60 if not provided
% resultTable – A table containing ‚hash‘ and ’normalizedScore‘ columns
% Extract unique hashes from both rankings
allHashes = unique([bm25Ranking.hash; cosineRanking.hash]);
% Create containers for rank positions, initialized to inf
bm25Ranks = inf(length(allHashes), 1);
cosineRanks = inf(length(allHashes), 1);
% Map the hashes to their ranks in the BM25 and cosine rankings
[~, bm25Idx] = ismember(allHashes, bm25Ranking.hash);
[~, cosineIdx] = ismember(allHashes, cosineRanking.hash);
% Assign the ranks where applicable
bm25Ranks(bm25Idx > 0) = bm25Idx(bm25Idx > 0);
cosineRanks(cosineIdx > 0) = cosineIdx(cosineIdx > 0);
% Compute the combined RRF scores using vectorized operations
combinedScores = (1 ./ (k + bm25Ranks)) + (1 ./ (k + cosineRanks));
% Normalize the combined scores to the range [0, 1]
maxRRFScore = 2 * (1 / (k + 1)); % maximum possible RRF score
normalizedScores = combinedScores / maxRRFScore;
% Create the result table with ‚hash‘ and ’normalizedScore‘ columns
resultTable = table(allHashes, normalizedScores, ‚VariableNames‘, {‚hash‘, ’normalizedScore‘});
% Sort the result table by normalized scores in descending order
resultTable = sortrows(resultTable, ’normalizedScore‘, ‚descend‘);