🗃️Vector Stores

A vector store is a specialized database for storing and retrieving high-dimensional numerical vectors. It efficiently manages and indexes these vectors for fast similarity searches.

1)AstraDB

Setup

1. Register an account on AstraDB

2. Login to portal. Create a Database

  1. Choose Serverless (Vector), fill in the Database name, Provider, and Region

  1. After database has been setup, grab the API Endpoint, and generate Application Token

  1. Create a new collection, select the desired dimenstion and similarity metric:

6. Back to THub canvas, drag and drop Astra node. Click Create New from the Credentials dropdown:

  1. Specify the API Endpoint and Application Token:

  1. You can now upsert data to AstraDB

Navigate back to Astra portal, and to your collection, you will be able to see all the data that has been upserted:

  1. Start querying!

2)Chroma

Prereuisite

1. Download & install Docker and Git

2. Clone Chroma's repository with your terminal

3. Change directory path to your cloned Chroma

Run docker compose to build up Chroma image and container

If success, you will be able to see the docker images spun up:

Setup

Additional

1.If you are running both THub and Chroma on Docker, there are additional steps involved. 2.Open docker-compose.yml in THub

Cd THub && cd Docker

3.Modify the file to:

4.Spin up THub docker image

5.On the Chroma URL, for Windows and MacOS Operating Systems specify http://host.docker.internal:8000. For Linux based systems the default docker gateway should be used since host.docker.internal is not available: http://172.17.0.1:8000

3)Elastic Prerequisite

1. You can use the official Docker image to get started, or you can use Elastic Cloud, Elastic's official cloud service. In this guide, we will be using cloud version.

2. Register an account or login with existing account on Elastic cloud.

3. Click Create deployment. Then, name your deployment, and choose the provider.

4.After deployment is finished, you should be able to see the setup guides as shown below. Click the Set up vector search option.

5.You should now see the Getting started page for Vector Search.

6.On the left hand side bar, click Indices. Then, Create a new index.

7. Select API ingestion method

8 .Name your search index name, then Create Index

9. After the index has been created, generate a new API key, take note of both generated API key and the URL

Setup

1. Add a new Elasticsearch node on canvas and fill in the Index Name

2. Add new credential via Elasticsearch API

3.Take the URL and API Key from Elasticsearch, fill in the fields

4.After credential has been created successfully, you can start upserting the data

  1. After data has been upserted successfully, you can verify it from Elastic dashboard:

  1. Voila! You can now start asking question in the chat

4)Faiss

Upsert embedded data and perform similarity search upon query using Faiss library from Meta.

5)In-Memory Vector Store

In-memory vectorstore that stores embeddings and does an exact, linear search for the most similar embeddings.

6)Milvus

Upsert embedded data and perform similarity search upon query using Milvus, world's most advanced open-source vector database.

7)MongoDB Atlas

Upsert embedded data and perform similarity or mmr search upon query using MongoDB Atlas, a managed cloud mongodb database.

8)OpenSearch

Upsert embedded data and perform similarity search upon query using OpenSearch, an open-source, all-in-one vector database.

9)Pinecone

Prerequisite

1. Register an account for Pinecone

2. Click Create index

1. Fill in required fields:

Index Name, name of the index to be created. (e.g. "THub-demo")

Dimensions, size of the vectors to be inserted in the index. (e.g. 1536)

2. Click Create Index

Setup

1.Get/Create your API Key

2. Add a new Pinecone node to canvas and fill in the parameters:

o Pinecone Index

o Pinecone namespace (optional)

1. Create new Pinecone credential -> Fill in API Key

4 Add additional nodes to canvas and start the upsert process

· Document can be connected with any node under Document Loader category

· Embeddings can be connected with any node under Embeddings category

5.Verify from Pinecone dashboard to see if data has been successfully upserted:

10)Postgres

Upsert embedded data and perform similarity search upon query using pgvector on Postgres.

11)Qdrant

Prerequisites

A locally running instance of Qdrant or a Qdrant cloud instance.

To get a Qdrant cloud instance:

1. Head to the Clusters section of the Cloud Dashboard.

2. Select Clusters and then click + Create.

3. Choose your cluster configurations and region.

4. Hit Create to provision your cluster.

Setup

1. Get/Create your API Key from the Data Access Control section of the Cloud Dashboard.

2. Add a new Qdrant node on canvas.

3. Create new Qdrant credential using the API Key

4. Enter the required info into the Qdrant node:

· Qdrant server URL

· Collection name

5. Document input can be connected with any node under Document Loader category.

6.Embeddings input can be connected with any node under Embeddings category.

Filtering

Let's say you have different documents upserted, each specified with a unique value under the metadata key {source}

Then, you want to filter by it. Qdrant supports following syntax when it comes to filtering:

UI

API

12)Redis

Prerequisite

Spin up a Redis-Stack Server using Docker

docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest

Setup

1. Add a new Redis node on canvas.

2. Create new Redis credential.

  1. Select type of Redis Credential. Choose Redis API if you have username and password, otherwise Redis URL:

  1. Fill in the url:

  1. Now you can start upserting data with Redis:

  1. Navigate to Redis Insight portal, and to your database, you will be able to see all the data that has been upserted:

13)SingleStore

Setup

1. Register an account on SingleStore

2. Login to portal. On the left side panel, click CLOUD -> Create new workspace group. Then click Create Workspace button.

3. Select cloud provider and data region, then click Next:

4. Review and click Create Workspace:

5. You should now see your workspace created:

6. Proceed to create a database

  1. You should be able to see your database created and attached to the workspace:

  1. Click Connect from the workspace dropdown -> Connect Directly:

  1. You can specify a new password or use the default generated one. Then click Continue:

10. On the tabs, switch to Your App, and select Node.js from the dropdown. Take note/save the Username, Host, Password as you will need these in THub later.

11. Back to THub canvas, drag and drop SingleStore nodes. Click Create New from the Credentials dropdown:

12. Put in the Username and Password

13. Then specify the Host and Database Name:

14. Now you can start upserting data with SingleStore:

  1. Navigate back to SingleStore portal, and to your database, you will be able to see all the data that has been upserted:

14)Supabase

Prerequisite

1.Register an account for Supabase

• Click New project

2.Input required fields

Field Name

Description

Name

name of the project to be created. (e.g. THub)

Database Password

password to your postgres database

3. Click Create new project and wait for the project to finish setting up

4. Click SQL Editor

5. Click New query

6. Copy and Paste the below SQL query and run it by Ctrl + Enter or click RUN. Take note of the table name and function name.

Table name: documents

Query name: match_documents

Setup

· Click Project Settings

· Get your Project URL & API Key

· Copy and Paste each details (API Key, URL, Table Name, Query Name) into Supabase node

· Document can be connected with any node under Document Loader category

· Embeddings can be connected with any node under Embeddings category

15)Upstash Vector

Upsert data as embedding or string and perform similarity search with Upstash, the leading serverless data platform.

· Document can be connected with any node under Document Loader category

· Embeddings can be connected with any node under Embeddings category

· Record manager can be conneted with the node under Record manager

16)Vectara

Prerequisite

· Register an account for Vectara

· Click Create Corpus

· Name the corpus to be created and click Create Corpus then wait for the corpus to finish setting up.

Setup

· Click on the "Access Control" tab in the corpus view

· Click on the "Create API Key" button, choose a name for the API key and pick the QueryService & IndexService option

· Click Create to create the API key

· Get your Corpus ID, API Key, and Customer ID by clicking the down-arrow under "copy" for your new API key:

· Back to THub canvas, and create your chatflow. Click Create New from the Credentials dropdown and enter your Vectara credentials.

· Document can be connected with any node under Document Loader category

Vectara Query Parameters

· For finer control over the Vectara query parameters, click on "Additional Parameters" and then you can update the following parameters from their default:

· Metadata Filter: Vectara supports meta-data filtering. To use filtering, ensure that metadata fields you want to filter by are defined in your Vectara corpus.

· "Sentences before" and "Sentences after": these control how many sentences before/after the matching text are returned as results from the Vectara retrieval engine

· Lambda: defines the behavior of hybrid search in Vectara

· Top-K: how many results to return from Vectara for the query

· MMR-K: number of results to use for MMR (max marginal relvance)

17)Weaviate

Upsert embedded data and perform similarity or mmr search using Weaviate, a scalable open-source vector database.

· Document can be connected with any node under Document Loader category

· Embeddings can be connected with any node under Embeddings category

· Record manager can be conneted with the node under Record manager

18)Zep Collection - Open Source

Upsert embedded data and perform similarity or mmr search upon query using Zep, a fast and scalable building block for LLM apps.

· Document can be connected with any node under Document Loader category

· Embedding node can be connected with any node under Embedding category

Upsert embedded data and perform similarity or mmr search upon query using Zep, a fast and scalable building block for LLM apps.

• Document can be connected with any node under Document Loader category

Last updated