Using OpenSearch with Matlab
by Ralf Elsas and Leo Schwarze (10/06/2022)
Table of Contents
1. Introduction
This is an installation guide on how to install OpenSearch locally and on a server and use it with Matlab. We’ll cover the installation with docker on Windows 10. We also try to discuss pitfalls and tricks we’ve learned to get it to run. If you have further tips and ideas to improve the workflow, we’ll be happy for a message.
It might be beneficial to check the following websites first:
- https://opensearch.org/docs/latest/opensearch/index/
- https://opensearch.org/docs/latest/opensearch/install/index/
2. Step-by-Step Installation Guide
2.1 Docker & OpenSearch
Install Docker for desktop here: https://www.docker.com/products/docker-desktop
- Set up docker compose file (or download the attached example file):
- Therefore, create new .txt file
- Write docker compose file / copy docker compose file from OpenSearch documentation or the Appendix A in this manual (If you choose to copy it from the website and it doesn’t work – there are some changes I made to the file that could help you. Please find these changes in the Appendix A)
- Save it and name it „docker-compose.yml“ (delete the .txt ending), or use the provided docker-compose.yml
- In preperation for the next step I advise to watch the following video and/or read through the following website (it deals with elasticsearch opendistro, but is very close to the OpenSearch installation. If you want to replicate what is done in the video there are only two terms you have to input differently than the user in the video. You have to change „elasticsearch“ to „opensearch“ and „opendistro_security“ to „opensearch-security“). (It also includes a very insightful explanation on how to use docker and the framework it creates to for example change passwords. It makes sense to go through this exercise yourself to learn about docker)
- https://www.youtube.com/watch?v=ta2_N-7VX8w
- https://middlewaretechnologies.in/2021/07/how-to-install-and-configure-opendistro-elasticsearch-using-docker-compose-and-update-the-authentication-settings.html
Open command prompt. Within the command prompt run the following commands one after another:
- cd <DIRECTORY_OF_DOCKER_COMPOSE.YML>
- wsl -d docker-desktop sysctl -w vm.max_map_count=262144
- docker compose up
- Check if OpenSearch is running. Go to the following websites and provide the default credentials (user = „kibanaserver“ / pw = „kibanaserver“)
- Dashboards: http://localhost:5601/app/login?nextUrl=%2F
This should bring up the Dashboards Web App. (You can try to create new users or change the password for the admin user e.g. under: „Security“ > „Internal Users“. Log out and in with the new user or the new password)
- OpenSearch: https://localhost:9200/
This should display something like:
OpenSearch should now be running on the respective server. You can also check whether the cluster and everything within the cluster lights up green in the docker desktop app. Now we want to access OpenSearch. Therefore the next section gives an explenation how to access the cluster locally or remotely.
2.2 Python Environment & Matlab
This section will provide an installation guide on how to set up the required Python environment and use it in matlab on the computer you want to access the server from:
Set up environment.yml (or download the attached example file):
- Create new .txt file
- Write opensearch.yml / copy opensearch.yml from Appendix B in this manual.
- Save it and name it „environment.yml“ (delete the .txt ending), or use the provided environment.yml.
Download Miniconda from e.g. here: https://www.anaconda.com/products/individual#windows
- Open Anaconda the command prompt. Within the command prompt run the following commands one after the other: (Helpful website to deal with conda environments: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)
- cd <DIRECTORY_OF_ENVIRONMENT.YML>
- conda env create -f environment.yml (This saves a conda environment named „opensearch“ to ‚<ANACONDA_ROOT_DIRECTORY>\envs\opensearch\python.exe‘)
- conda env list (Check if OpenSearch environment is listed there)
- conda activate opensearch(activates the environment)
- matlab (opens matlab within the activated environment, in our experience indispensable (!) such that Matlab recongiszes the Python environment)
Communication between Matlab and OpenSearch is implemented via a Python-wrapper function, called opensearch.m. Perhaps there are more elegant ways (see for example the mlelasticsearch function in Mathworks FileExchange) – we’d appreciate any improvement and comments.
You find an example script for testing whether OpenSearch works. This is an illustrative example, containing a few Form 8-K firm disclosures from the U.S., together with document and example query SBERT embeddings (for Matlab make sure that opensearch.m is in the current folder or that the folder path is added):
- „opensearch_example.m“ (it can make sense to include the starting of the OpenSearch environment into the matlab startup.m file (pyenv(‚Version‘,'<ANACONDA_ROOT_DIRECTORY>\envs\opensearch\python.exe‘,‘ExecutionMode‚,‘InProcess‚);))
The example file will create an index in OpenSearch, upload the data and run two example queries – one based on the BM25 keyword search algorithm and one vector search which relies on sentence embedings which enable semantic search.
To connect via Matlab to OpenSearch:
- locally: runcredentials = struct();credentials.user =’admin‘;credentials.secret =’admin‘;os = opensearch(‚loc‘, credentials)
- remote: runcredentials = struct();credentials.user =’admin‘;credentials.secret =’admin‘;credentials.ip ='<SERVER_IP>’os = opensearch(‚loc‘, credentials)
3. Additional Comments
There are several problems we encountered:
The docker compose file in Appendix A includes https authentication. If you want to use OpenSearch without security features the host changes from https to http in the beginning. This can be an issue when using the provided matlab or Python scripts.
The OpenSearch documention often features the default save directories of specific files. However these are the directories within the docker framework. So if you want to makes changes to certain files associated with OpenSearch you have to follow the next steps: (Please also refer to the YT Video for instructions: https://www.youtube.com/watch?v=ta2_N-7VX8w)
- Open command prompt and run „docker ps“
- Copy container ID of the container, where respective file is stored and run „docker exec -it <CONTAINER_ID> /bin/bash“
- „pwd“ shows you the current directory you are in
- „ls -ltr“ shows you what’s inside the directory
- Now you can cd into every directory you want within that container
- „view <FILENAME>“ lets you view and edit the files in the directory
- to exit the file just type „:q“ to save and exit the file type „:wq“
- to exit the container just type „exit“
APPENDIX
A: Example docker-compose.yml
Down below you’ll find the docker-compose.yml I used to set up the OpenSearch installation. Changes I made to the origial docker-compose.yml from the website include:
- Change „soft/hard“ from 65536 to 262144
- Delete the .pem etc. at the end of all volumes
- Change „OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m“ to „OPENSEARCH_JAVA_OPTS=-Xms10g -Xmx10g“ in the node-1 section
Please find a docker-compose.yml that includes these changes in the download section.
B: Example OpenSearch environment.yml
C: Downloads
This zip-file includes all files mentioned above. As an overview:
- yml-files (docker-compose, environment)
- opensearch_example.m (m-script to illustrate login, index generation, BM25 and vector-search queries)
- test data (8-K disclosures and SBERT embeddings, example queries with embeddings)
- opensearch.m is the wrapper-function to send OpenSearch commands from Matlab to Python
D: Summary of all links
- Docker Desktop: https://www.docker.com/products/docker-desktop
- OpenSearch Documentation: https://opensearch.org/docs/latest/opensearch/install/index/
- Installation Video: https://www.youtube.com/watch?v=ta2_N-7VX8w
- Verbal Installation Documentation: https://middlewaretechnologies.in/2021/07/how-to-install-and-configure-opendistro-elasticsearch-using-docker-compose-and-update-the-authentication-settings.html
- Max Map Count: https://stackoverflow.com/questions/42111566/elasticsearch-in-windows-docker-image-vm-max-map-count
- Anaconda for Windows: https://www.anaconda.com/products/individual#windows
- Manage Anaconda Environments: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
- Sentence Transformers SBERT: https://www.sbert.net/
- ElasticSearch Matlab: https://de.mathworks.com/matlabcentral/fileexchange/73695-matlab-elasticsearch-mlelasticsearch