Example RAG Matlab (with BM25, Vector Embeddings, and Reciprocal Rank Fusion)

by Ralf Elsas

Overview

A toy example for RAG in Matlab, using a LLM with Ollama, and:

– Uses BM25 keyword-based scoring and vector embeddings from sentence-transformers.

– Retrieval results are merged using Reciprocal Rank Fusion (RRF).

Note that Ollama uses quantized versions of the LLMs to make them usable for CPU and GPU based on llama.cpp.

———————

Build on the Mathworks blog by Sivylla Paraskeopoulou:

Local LLMs with MATLAB » Artificial Intelligence – MATLAB & Simulink (mathworks.com)

——————–

Note: You have to have

Text Analytics Toolbox (since Matlab 2024a)
„llms-with-matlab“ toolbox (GitHub – matlab-deep-learning/llms-with-matlab: Connect MATLAB to LLM APIs, including OpenAI® Chat Completions, Azure® OpenAI Services, and Ollama™)
Ollama installed (ollama.com), with some LLM like LLama3.1 downloaded.

% Set-up system prompt

system_prompt = „You are a helpful assistant. You might get a “ + …

 „context for each question, but only use the information “ + …

 „in the context if that makes sense to answer the question. „;

chat = ollamaChat(„llama3.1“,system_prompt);

Simple question LLM

Here, no RAG is used It’s simply knowledge of one of the most recent LLM that can run even on CPU or small GPU, Llama3.1.

query_simple = „What is the most famous Verdi opera?“;

prompt_simple = „Answer the following question: „+ query_simple;

% Generate a response.

response_simple = generate(chat,prompt_simple);

wrapText(response_simple)

ans = 
„The most famous Verdi opera is likely „La Traviata“.
This opera, which premiered in 1853, is a romantic tragedy based on a novel by Alexandre Dumas and features some of Verdi’s most beloved music, including „Sempre Libera“ and „Adeuse““

RAG question

query_tech = „How to import a PyTorch model into MATLAB?“;

prompt_tech = „Answer the following question: „+ query_tech;

response_tech = generate(chat,prompt_tech);

wrapText(response_tech)

ans = 
„Unfortunately, importing a PyTorch model directly into MATLAB is not straightforward due to fundamental differences between the two frameworks.
However, you can use the `onnx` library in Python (available via pip with „`pip install onnx„`) and the `ontrix` package in MATLAB to convert your model from PyTorch .
pth file into an ONNX (.
onen) file which can be then read directly by MATLAB.
Here are basic steps:
1.
**Convert PyTorch Model to ONNX:**
First, you need to save your PyTorch model as an ONNX model.
„`python
# Imports the necessary modules
import onnx
# Load your saved PyTorch model or define a new instance to convert
loaded_model = torch.
load(„/path/to/model.
pth“, map_location=torch.
device(‚cpu‘))
# Save it in ONNX format which MATLAB’s ontrix can read.
torch.
onnx.
export(loaded_model, torch.
randn(1, 3, 224, 224), „model.
onnx“)
„`
2.
**Use `ontrix.
matlab_interface` Function in MATLAB to Load the ONNX Model:**
After converting your PyTorch model into an ONNX format, you can load it using `ontrix`, a third-party package that allows for conversion and interaction of models between different environments (though note that usage or support might be limited outside MATLAB for this library).
First, you need to download the `OnnXMLReader.
m` file from [here](http://research.mathworks.com/).
Place it into your path.
Then in MATLAB:
„`matlab
OnnXML_Reader = onnxml matlab interface
model_path = onreadnet(‚OnnnPath‘,’model.
onnx‘);
„`
This will allow you to load the ONNX version of your model, and work with it inside MATLAB.
3.
**Run Predictions in MATLAB:**
You can then use the loaded ONNX mode in MATLAB:
„`matlab
% Define inputs as a batch of images
X = onnninput(mdl_path);
% Get the number of outputs
num_outputs = mdl.
path.
numout;
% Run predictions
Y = onpredict(X,mdl_path);
„`
Note that this example is meant to guide you through a process that should work for most simple models but might require adjustments for more complex scenarios.
Additionally, since `onntrix` is not as widely supported or documented as PyTorch and ONNX themselves, you might need to dig into MATLAB forums or seek additional resources if you encounter any issues.“

++++++ RAG ++++++++++++++

Download source document

url = „https://blogs.mathworks.com/deep-learning/2024/04/22/convert-deep-learning-models-between-pytorch-tensorflow-and-matlab/“;

% Define the local path where the post will be saved, download it using the provided URL, and save it to the specified local path.

localpath = „./data/“;

if ~exist(localpath, ‚dir‘)

 mkdir(localpath);

end

filename = „blog.html“;

websave(localpath+filename,url);

Read the text from the downloaded file by first creating a FileDatastore object.

fds = fileDatastore(localpath,„FileExtensions“,„.html“,„ReadFcn“,@extractFileText);

str = [];

while hasdata(fds)

 textData = read(fds);

 str = [str; textData];

end

% Define a function for text preprocessing.

function [tokDocs, strDocs] = preprocessDocuments(str)

 paragraphs = splitParagraphs(join(str));

 tokDocs = tokenizedDocument(paragraphs);

 strDocs = paragraphs;

end

% Split the text data into paragraphs.

[tokDocs, strDocs] = preprocessDocuments(str);

BM25 Ranking

Tokenize the query and find similarity scores between the query and document.

bm25Docs = bm25Similarity(tokDocs,tokenizedDocument(query_tech));

[~, idxBM25] = sort(bm25Docs,„descend“);

display(strDocs(idxBM25(1:5)));

5×1 string array“Why Import Models into MATLAB?“
„Import models from TensorFlow and PyTorch into MATLAB “
„Trace the PyTorch model. For more information on how to trace a PyTorch model, go to Torch documentation: Tracing a function. Then, save the PyTorch model. “
„This example shows you how to import an image classification model from PyTorch. The PyTorch model must be pretrained and traced. “
„Import Models into MATLAB“

Sentence Embeddings

Get vector embeddings using a simple, small, but classic BERT-based sentence embedding model, directly implemented also in Matlab (sentence transformer „all-MiniLM-L6-v2), initially developed by Nils Reimers at TU Darmstadt (now Cohere).

emb = documentEmbedding(„Model“,„all-MiniLM-L6-v2“);

embeddedDocuments = embed(emb,strDocs);

queryEmb = embed(emb,query_tech);

embDocs = cosineSimilarity(embeddedDocuments,queryEmb); 

[~, idxEMB] = sort(embDocs,„descend“);

display(strDocs(idxEMB(1:5)));

5×1 string array“Import models from TensorFlow and PyTorch into MATLAB “
„Export models from MATLAB to TensorFlow and PyTorch “
„What? Export models to TensorFlow. Export models to PyTorch. “
„Import Models into MATLAB“
„What? Import PyTorch models. Import TensorFlow models. Import PyTorch and Tensorflow models interactively. „

Reciprocal Rank Fusion

Determine RRF

This step of RAG (actually, retrieval) takes the two rankings and uses a classic formula to bring the two rankings together. Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set. It is a reranking algorithm that gives a reciprocal rank to documents in multiple sources, then combines those ranks and documents into one final reranked list. RRF requires no tuning, and the different relevance indicators do not have to be related to each other to achieve high-quality results.

The original article suggesting RRF is:

Cormack, G. / Clarke, C. / Büttcher, S. (2009): Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods SIGIR’09, July 19–23, 2009.

hashes = „hash_“+[1:length(strDocs)]‘;

bm25Ranking = table([hashes(idxBM25)],‚VariableNames‘,[„hash“]);

embRanking = table([hashes(idxEMB)],‚VariableNames‘,[„hash“]);

resTable = reciprocal_rank_fusion(bm25Ranking,embRanking);

idxRRF = find(ismember(resTable.hash,hashes));

display(strDocs(idxRRF(1:5)));

5×1 string array“In this blog post we are going to show you how to use the newest MATLAB functions to: “
„Import models from TensorFlow and PyTorch into MATLAB “
„Export models from MATLAB to TensorFlow and PyTorch “
„This is a brief blog post that points you to the right functions and other resources for converting deep learning models between MATLAB, PyTorch®, and TensorFlow™. Two good resources to get started with are the documentation topics Interoperability Between Deep Learning Toolbox, TensorFlow, PyTorch, and ONNX and Tips on Importing Models from TensorFlow, PyTorch, and ONNX. “
„If you have any questions about the functionality presented in this blog post or want to share the exciting projects for which you are using model conversion, comment below. „

Prepare best docs for LLM (retrieval)

Sort the documents in descending order of similarity scores.

limitWords = 1000;

selectedDocs = [];

totalWords = 0;

% Iterate over the sorted document indices until the word limit is reached.

i = 1;

while totalWords <= limitWords && i <= length(idxRRF)

 totalWords = totalWords + size(tokDocs(idxRRF(i)).tokenDetails,1);

 selectedDocs = [selectedDocs; strDocs(idxRRF(i))];

 i = i + 1;

end

RAG answer

Define the prompt for the chatbot with added technical context, and generate a response.

prompt_rag = „Context:“ + join(selectedDocs, “ „) …

 + newline +„Answer the following question: „+ query_tech;

response_rag = generate(chat, prompt_rag);

wrapText(response_rag)

ans = 
„You can import a PyTorch model into MATLAB using the `importNetworkFromPyTorch` function, which allows you to convert a PyTorch model to a MATLAB network with just one line of code.
The function takes as input the path to a PyTorch model file (e.g., „mnasnet1_0.
pt“) and automatically creates and adds the input layer for a batch of images if you specify the `PyTorchInputSizes` name-value argument.“

Side-note: Sometimes, no RAG is the answer

response_direct = generate(chat, query_tech+“ Answer this question based on the following input“+newline+„###:“+newline+str);

wrapText(response_direct)

ans = 
„Based on the provided context, here is a concise and accurate answer to the question:
To import a PyTorch model into MATLAB, you can use the `importNetworkFromPyTorch` function.
This function allows you to import a pre-trained PyTorch model, along with its weights and architecture, directly into MATLAB.
You will need to have the saved PyTorch model file (e.g., „mnasnet1_0.
pt“) available.
Specifically, you can use the following code in MATLAB:
„`matlab
net = importNetworkFromPyTorch(‚path/to/model/mnasnet1_0.
pt‘);
„`
Make sure to replace `’path/to/model/mnasnet1_0.
pt’` with the actual file path.
Also, keep in mind that different models may have different file names, so make sure to use the correct name for your model.
Once imported, you can use the PyTorch model as a regular MATLAB network, allowing you to perform various tasks such as transfer learning, visualization of activations, explanations, and verification with Deep Learning Toolbox Verification Library.“

Extensions

As mentioned before, this is at best toy example as one could have fed the whole input document (the blog on the Pytorch import) directly into the LLM. The power of RAG comes with huge sets of documents that need to be searched and then condensed to a concise answer using the LLM. For such an application, one needs the following extensions:

Database: Documents (or documents stored as chunks of text) should be stored in an database that supports BM25 and vector search. One simple solution is to run OpenSearch in a Docker container. A primer on the installation and usage can be found in my old blog post.
Framework: Due to its most recent functionalities, Matlab offers all core elements to set-up RAG, but more advanced chunking techniques, GraphRAG, and pdf-parsing etc. will require self-programming efforts and potentially usage of Python tools from Matlab. Note that nowadays there are many NLP search databases and frameworks out there to set-up a full RAG pipeline, These use, however, mostly Python. As personal, very subjective recommendations take a look at Vespa for example – one of the most powerful and flexible providers with SOTA implementations. Txtai, Haystack, LlamaIndex are also very nice framework implementations.
Parsing: Despite all fancy tools for RAG, the most important one is loading your text data and information to be searched with high quality. If you’ve ever tried parsing a pdf of a scientific paper, an analyst report etc., you will know the immense difficulties of getting this seemingly trivial problem right, in particular whenever one has to deal with text, tables, and graphs in a document. There has been a lot of progress recently, however. Tools like Marker, LlamaParse and Nougat are promising, in particular because (at least the first two mentioned) are getting maintained and improved systematically,
Embeddings: Altough BM25 is an old technique, it is still a very powerful benchmark for all retrieval purposes. Combined (!) with vector search based on embeddings by language models like BERT, LLama, etc., retrieval works almost as good as Google. Ever since the introduction of sentence transformers in a seminal paper by Nils Reimers and Irina Gurevych in 2019, there have been big advances in retrieval methods with embeddings, see for example HuggingFace’s text embedding benchmark, or Colbert and its multimodal extensions like ColPali. Using these methods can be done by using Matlab /Python interaction.

++++++++++ Helper functions

% Define a function that wraps text, which you can use to make the generated text easier to read.

function wrappedText = wrapText(text)

 wrappedText = splitSentences(text);

 wrappedText = join(wrappedText,newline);

end

function resultTable = reciprocal_rank_fusion(bm25Ranking, cosineRanking, k)

 % Reciprocal Rank Fusion (RRF) with normalization and handling of different list sizes

 % Outputs a single table with hashes and normalized scores based on the

 % ranks

%

 % Inputs:

 % bm25Ranking – A table with column: ‚hash‘ for BM25 rankings

 % cosineRanking – A table with column: ‚hash‘ for Cosine Similarity rankings

 % k – The constant parameter for RRF, default to 60 if not provided

%

 % Outputs:

 % resultTable – A table containing ‚hash‘ and ’normalizedScore‘ columns

arguments

 bm25Ranking table

 cosineRanking table

 k double=60

end

 % Extract unique hashes from both rankings

 allHashes = unique([bm25Ranking.hash; cosineRanking.hash]);

 % Create containers for rank positions, initialized to inf

 bm25Ranks = inf(length(allHashes), 1);

 cosineRanks = inf(length(allHashes), 1);

 % Map the hashes to their ranks in the BM25 and cosine rankings

 [~, bm25Idx] = ismember(allHashes, bm25Ranking.hash);

 [~, cosineIdx] = ismember(allHashes, cosineRanking.hash);

 % Assign the ranks where applicable

 bm25Ranks(bm25Idx > 0) = bm25Idx(bm25Idx > 0);

 cosineRanks(cosineIdx > 0) = cosineIdx(cosineIdx > 0);

 % Compute the combined RRF scores using vectorized operations

 combinedScores = (1 ./ (k + bm25Ranks)) + (1 ./ (k + cosineRanks));

 % Normalize the combined scores to the range [0, 1]

 maxRRFScore = 2 * (1 / (k + 1)); % maximum possible RRF score

 normalizedScores = combinedScores / maxRRFScore;

 % Create the result table with ‚hash‘ and ’normalizedScore‘ columns

 resultTable = table(allHashes, normalizedScores, ‚VariableNames‘, {‚hash‘, ’normalizedScore‘});

 % Sort the result table by normalized scores in descending order

 resultTable = sortrows(resultTable, ’normalizedScore‘, ‚descend‘);

end