🗃️Vector Stores

A vector store is a specialized database for storing and retrieving high-dimensional numerical vectors. It efficiently manages and indexes these vectors for fast similarity searches.

1)AstraDB

Setup

1. Register an account on AstraDB

2. Login to portal. Create a Database

Choose Serverless (Vector), fill in the Database name, Provider, and Region

After database has been setup, grab the API Endpoint, and generate Application Token

Create a new collection, select the desired dimenstion and similarity metric:

6. Back to THub canvas, drag and drop Astra node. Click Create New from the Credentials dropdown:

Specify the API Endpoint and Application Token:

You can now upsert data to AstraDB

Navigate back to Astra portal, and to your collection, you will be able to see all the data that has been upserted:

Start querying!

2)Chroma

Prereuisite

1. Download & install Docker and Git

2. Clone Chroma's repository with your terminal

3. Change directory path to your cloned Chroma

Run docker compose to build up Chroma image and container

If success, you will be able to see the docker images spun up:

Setup

Additional

1.If you are running both THub and Chroma on Docker, there are additional steps involved. 2.Open docker-compose.yml in THub

Cd THub && cd Docker

3.Modify the file to:

4.Spin up THub docker image

5.On the Chroma URL, for Windows and MacOS Operating Systems specify http://host.docker.internal:8000. For Linux based systems the default docker gateway should be used since host.docker.internal is not available: http://172.17.0.1:8000

3)Elastic Prerequisite

1. You can use the official Docker image to get started, or you can use Elastic Cloud, Elastic's official cloud service. In this guide, we will be using cloud version.

2. Register an account or login with existing account on Elastic cloud.

3. Click Create deployment. Then, name your deployment, and choose the provider.

4.After deployment is finished, you should be able to see the setup guides as shown below. Click the Set up vector search option.

5.You should now see the Getting started page for Vector Search.

6.On the left hand side bar, click Indices. Then, Create a new index.

7. Select API ingestion method

8 .Name your search index name, then Create Index

9. After the index has been created, generate a new API key, take note of both generated API key and the URL

Setup

1. Add a new Elasticsearch node on canvas and fill in the Index Name

2. Add new credential via Elasticsearch API

3.Take the URL and API Key from Elasticsearch, fill in the fields

4.After credential has been created successfully, you can start upserting the data

After data has been upserted successfully, you can verify it from Elastic dashboard:

Voila! You can now start asking question in the chat

4)Faiss

Upsert embedded data and perform similarity search upon query using Faiss library from Meta.

5)In-Memory Vector Store

In-memory vectorstore that stores embeddings and does an exact, linear search for the most similar embeddings.

6)Milvus

Upsert embedded data and perform similarity search upon query using Milvus, world's most advanced open-source vector database.

7)MongoDB Atlas

Upsert embedded data and perform similarity or mmr search upon query using MongoDB Atlas, a managed cloud mongodb database.

8)OpenSearch

Upsert embedded data and perform similarity search upon query using OpenSearch, an open-source, all-in-one vector database.

9)Pinecone

Prerequisite

1. Register an account for Pinecone

2. Click Create index

1. Fill in required fields:

• Index Name, name of the index to be created. (e.g. "THub-demo")

• Dimensions, size of the vectors to be inserted in the index. (e.g. 1536)

2. Click Create Index

Setup

1.Get/Create your API Key

2. Add a new Pinecone node to canvas and fill in the parameters:

o Pinecone Index

o Pinecone namespace (optional)

1. Create new Pinecone credential -> Fill in API Key

4 Add additional nodes to canvas and start the upsert process

· Document can be connected with any node under Document Loader category

· Embeddings can be connected with any node under Embeddings category

5.Verify from Pinecone dashboard to see if data has been successfully upserted:

10)Postgres

Upsert embedded data and perform similarity search upon query using pgvector on Postgres.

11)Qdrant

Prerequisites

A locally running instance of Qdrant or a Qdrant cloud instance.

To get a Qdrant cloud instance:

1. Head to the Clusters section of the Cloud Dashboard.

2. Select Clusters and then click + Create.

3. Choose your cluster configurations and region.

4. Hit Create to provision your cluster.

Setup

1. Get/Create your API Key from the Data Access Control section of the Cloud Dashboard.

2. Add a new Qdrant node on canvas.

3. Create new Qdrant credential using the API Key

4. Enter the required info into the Qdrant node:

· Qdrant server URL

· Collection name

5. Document input can be connected with any node under Document Loader category.

6.Embeddings input can be connected with any node under Embeddings category.

Filtering

Let's say you have different documents upserted, each specified with a unique value under the metadata key {source}

Then, you want to filter by it. Qdrant supports following syntax when it comes to filtering:

API

12)Redis

Prerequisite

Spin up a Redis-Stack Server using Docker

docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest

Setup

1. Add a new Redis node on canvas.

2. Create new Redis credential.

Select type of Redis Credential. Choose Redis API if you have username and password, otherwise Redis URL:

Fill in the url:

Now you can start upserting data with Redis:

Navigate to Redis Insight portal, and to your database, you will be able to see all the data that has been upserted:

13)SingleStore

Setup

1. Register an account on SingleStore

2. Login to portal. On the left side panel, click CLOUD -> Create new workspace group. Then click Create Workspace button.

3. Select cloud provider and data region, then click Next:

4. Review and click Create Workspace:

5. You should now see your workspace created:

6. Proceed to create a database

You should be able to see your database created and attached to the workspace:

Click Connect from the workspace dropdown -> Connect Directly:

You can specify a new password or use the default generated one. Then click Continue:

10. On the tabs, switch to Your App, and select Node.js from the dropdown. Take note/save the Username, Host, Password as you will need these in THub later.

11. Back to THub canvas, drag and drop SingleStore nodes. Click Create New from the Credentials dropdown:

12. Put in the Username and Password

13. Then specify the Host and Database Name:

14. Now you can start upserting data with SingleStore:

Navigate back to SingleStore portal, and to your database, you will be able to see all the data that has been upserted:

14)Supabase

Prerequisite

1.Register an account for Supabase

• Click New project

2.Input required fields

Field Name

Description

Name

name of the project to be created. (e.g. THub)

Database Password

password to your postgres database

3. Click Create new project and wait for the project to finish setting up

4. Click SQL Editor

5. Click New query

6. Copy and Paste the below SQL query and run it by Ctrl + Enter or click RUN. Take note of the table name and function name.

Table name: documents

Query name: match_documents

Setup

· Click Project Settings

· Get your Project URL & API Key

· Copy and Paste each details (API Key, URL, Table Name, Query Name) into Supabase node

· Document can be connected with any node under Document Loader category

· Embeddings can be connected with any node under Embeddings category

15)Upstash Vector

Upsert data as embedding or string and perform similarity search with Upstash, the leading serverless data platform.

· Document can be connected with any node under Document Loader category

· Embeddings can be connected with any node under Embeddings category

· Record manager can be conneted with the node under Record manager

16)Vectara

Prerequisite

· Register an account for Vectara

· Click Create Corpus

· Name the corpus to be created and click Create Corpus then wait for the corpus to finish setting up.

Setup

· Click on the "Access Control" tab in the corpus view

· Click on the "Create API Key" button, choose a name for the API key and pick the QueryService & IndexService option

· Click Create to create the API key

· Get your Corpus ID, API Key, and Customer ID by clicking the down-arrow under "copy" for your new API key:

· Back to THub canvas, and create your chatflow. Click Create New from the Credentials dropdown and enter your Vectara credentials.

· Document can be connected with any node under Document Loader category

Vectara Query Parameters

· For finer control over the Vectara query parameters, click on "Additional Parameters" and then you can update the following parameters from their default:

· Metadata Filter: Vectara supports meta-data filtering. To use filtering, ensure that metadata fields you want to filter by are defined in your Vectara corpus.

· "Sentences before" and "Sentences after": these control how many sentences before/after the matching text are returned as results from the Vectara retrieval engine

· Lambda: defines the behavior of hybrid search in Vectara

· Top-K: how many results to return from Vectara for the query

· MMR-K: number of results to use for MMR (max marginal relvance)

17)Weaviate

Upsert embedded data and perform similarity or mmr search using Weaviate, a scalable open-source vector database.

· Document can be connected with any node under Document Loader category

· Embeddings can be connected with any node under Embeddings category

· Record manager can be conneted with the node under Record manager

18)Zep Collection - Open Source

Upsert embedded data and perform similarity or mmr search upon query using Zep, a fast and scalable building block for LLM apps.

· Document can be connected with any node under Document Loader category

· Embedding node can be connected with any node under Embedding category

Upsert embedded data and perform similarity or mmr search upon query using Zep, a fast and scalable building block for LLM apps.

• Document can be connected with any node under Document Loader category

19)Couchbase Vector Store

Couchbase integrates seamlessly with THub as a high-performance vector store, enabling efficient storage and retrieval of vector embeddings.

Key Features:

Data Upsertion: Allows upserting of embedded data into Couchbase buckets, scopes, and collections.
Vector Search: Supports vector similarity searches using approximate nearest neighbor (ANN) algorithms.Couchbase
Integration with Flowise: Facilitates the creation of Retrieval-Augmented Generation (RAG) pipelines by combining document loaders, embedding models, and retrievers.

Use Case Example:

Workflow Setup: A typical THub setup includes nodes for uploading documents (e.g., PDFs), splitting text into chunks, generating embeddings (e.g., using OpenAI models), and storing them in Couchbase. Retrieval nodes can then fetch relevant documents based on user queries.

20) Document Store (Vector)

The Document Store (Vector) node in THub offers a centralized approach to managing and retrieving vectorized documents.

Key Features:

Data Management: Enables uploading, splitting, and preparing datasets for upsertion in a single location.
Versatility: Supports various data formats, simplifying data handling within THub.
API Operations: Provides endpoints for creating, retrieving, updating, and deleting document stores and their contents.

Use Case Example:

Insurance Policy Retrieval: Setting up a system to retrieve information about specific insurance policies by uploading relevant documents, processing them into vector embeddings, and enabling semantic search capabilities.

21) Meilisearch Vector Store

Meilisearch, known for its lightweight and fast search capabilities, has introduced vector search functionalities, making it suitable for semantic and hybrid search applications.

Key Features:

AI-Powered Search: Utilizes large language models (LLMs) to retrieve search results based on the meaning and context of queries.
Embedding Integration: Supports configuring embedders (e.g., OpenAI) to translate documents into embeddings for semantic search.
Hybrid Search: Combines traditional keyword-based search with vector search for enhanced relevance.

Use Case Example:

E-commerce Search: Implementing a search system that understands user intent and context, providing more accurate product recommendations and search results.

Previous🔌Tools (MCP)Next🦙LLama Index

Last updated 1 month ago