📁Document Loaders

Document loaders allow you to load documents from different sources like PDF, TXT, CSV, Notion, Confluence etc. They are often used together with Vector Stores to be upserted as embeddings.

1)API Loader

Loads data from an external API endpoint and converts the response into documents that can be processed by downstream components.

Setup

• Document Loaders > drag API Loader node • Select the HTTP Method required to call the API (GET, POST) • Enter the API URL from which the data needs to be fetched • Configure Additional Parameters if needed • Execute the loader to retrieve API data

You can now use the API Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The API Loader allows users to fetch data from external APIs and convert the retrieved response into structured documents. These documents can then be used for further processing such as embedding generation, indexing, or retrieval-based question answering. It is useful when working with dynamic data sources such as REST APIs, web services, or internal backend services.

Features

· API Data Retrieval: Fetches data directly from external APIs using supported HTTP methods.

· Structured Document Conversion: Converts API responses into documents that can be processed by AI pipelines.

· Flexible Integration: Supports integration with different APIs and backend services.

· Text Processing Support: Works with text splitters to break large API responses into smaller chunks.

· Automation Friendly: Enables automated workflows by continuously fetching and processing API data.

2)Airtable

Loads records from an Airtable base and converts them into documents that can be used in AI workflows.

Setup

• Document Loaders > drag Airtable Loader node • Connect Credential > click Create New • Provide the Airtable API Key • Enter the Base ID of the Airtable base • Enter the Table ID from which records need to be fetched • Optionally provide the View ID to filter records • Configure Additional Parameters if required

You can now use the Airtable Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Airtable Loader allows users to retrieve structured data stored in Airtable bases and convert it into documents for further processing. It connects to Airtable using API credentials and fetches records from a specified base, table, and optional view. The retrieved data can then be used in pipelines such as document processing, embedding generation, retrieval systems, or knowledge base creation.

Features

· Airtable Integration: Connects directly to Airtable using secure API credentials.

· Structured Data Retrieval: Fetches records from specified bases and tables.

· View Filtering: Supports retrieving records from a specific Airtable view.

· Document Conversion: Converts Airtable records into documents usable in AI pipelines.

· Text Processing Support: Can integrate with text splitters to process large datasets efficiently.

3)Apify Website Content Crawler

Loads and crawls website content using Apify and converts the extracted data into documents that can be used in AI pipelines.

Setup

• Document Loaders > drag Apify Website Content Crawler node • Connect Apify API > click Create New • Provide the Apify API Key • Enter the Start URLs from where crawling should begin • Select the Crawler Type depending on the crawling method • Configure Additional Parameters if required

You can now use the Apify Website Content Crawler node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Apify Website Content Crawler allows users to crawl websites and extract structured text content using Apify’s web crawling infrastructure. It starts from the provided URLs and automatically navigates through pages to collect relevant content. The extracted data is converted into documents that can be used for further processing such as embeddings, indexing, or retrieval-based applications.

Features

· Website Crawling: Automatically crawls web pages starting from the specified URLs.

· Apify Integration: Connects to the Apify platform using API credentials to run web crawling tasks.

· Multiple Crawler Modes: Supports different crawler types such as headless browser crawling and HTTP-based crawling.

· Content Extraction: Extracts page content and converts it into structured documents.

· Text Processing Support: Works with text splitters to break large website content into smaller chunks for efficient processing.

4) BraveSearch API Document Loader

Loader used to fetch search results from the Brave Search API and convert them into documents that can be processed in AI workflows.

Setup

• Document Loaders > drag BraveSearch API Document Loader node • Connect Credential > click Create New • Provide the Brave Search API Key • Enter the search Query for which results need to be retrieved • Configure Additional Parameters if required

You can now use the BraveSearch API Document Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The BraveSearch API Document Loader retrieves search results directly from the Brave Search API based on a user-provided query. The returned results are converted into structured documents that can be used for further processing such as embeddings, indexing, or retrieval-based question answering. This loader is useful when integrating real-time web search data into AI pipelines.

Features

· Web Search Integration: Retrieves search results using the Brave Search API.

· Query-Based Retrieval: Allows users to fetch information based on custom search queries.

· Document Conversion: Converts search results into structured documents suitable for AI workflows.

· Real-Time Data Access: Enables AI applications to use up-to-date information from web search results.

· Pipeline Compatibility: Works with text splitters and downstream components for further processing.

5)Cheerio Web Scraper

Loader used to scrape content from web pages using the Cheerio library and convert the extracted data into documents for AI processing.

Setup

• Document Loaders > drag Cheerio Web Scraper node • Enter the URL of the website from which content needs to be extracted • Use Manage Links if multiple pages or links need to be scraped • Configure Additional Parameters if required

You can now use the Cheerio Web Scraper node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Cheerio Web Scraper allows users to extract content from web pages using the Cheerio HTML parsing library. It retrieves the HTML content from the specified URL and parses the page to extract readable text. The extracted content is then converted into documents that can be used in AI pipelines such as embedding generation, indexing, or knowledge base creation.

Features

· Web Page Scraping: Extracts content directly from website pages using the provided URL.

· HTML Parsing: Uses the Cheerio library to efficiently parse and process HTML content.

· Multi-Page Support: Allows scraping of multiple links using the Manage Links option.

· Document Conversion: Converts scraped web content into structured documents for AI processing.

· Pipeline Compatibility: Works with text splitters and downstream components in AI workflows.

6) Confluence Loader

Loader used to retrieve content from Confluence spaces and convert the retrieved pages into documents for AI processing.

Setup

• Document Loaders > drag Confluence node • Connect Credential > click Create New • Provide the Confluence API credentials • Enter the Base URL of the Confluence workspace (for example https://example.atlassian.net/wikiarrow-up-right) • Enter the Space Key from which pages need to be retrieved • Set the Limit to define the number of pages to fetch • Configure Additional Parameters if required

You can now use the Confluence Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Confluence Loader allows users to retrieve documentation and content stored in Confluence spaces. It connects to the Confluence workspace using API credentials and fetches pages from a specified space. The retrieved pages are converted into structured documents that can be used in AI pipelines such as embeddings, indexing, or retrieval-based question answering systems.

Features

· Confluence Integration: Connects directly to Confluence using secure API credentials.

· Space-Based Retrieval: Retrieves pages from a specific Confluence space using the space key.

· Configurable Page Limit: Allows users to define how many pages should be fetched.

· Document Conversion: Converts Confluence pages into structured documents for AI workflows.

· Pipeline Compatibility: Works with text splitters and other downstream AI processing components.

7)CSV File

Loader used to read data from CSV files and convert the content into documents that can be processed in AI workflows.

Setup

• Document Loaders > drag CSV File node • Upload the CSV file using the Upload File option • Optionally enter the column name under Single Column Extraction if only a specific column needs to be processed • Configure Additional Parameters if required

You can now use the CSV File Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The CSV File Loader allows users to import structured data stored in CSV files and convert the contents into documents. Each row or selected column from the CSV file can be processed and transformed into text documents. These documents can then be used in AI pipelines such as embeddings generation, indexing, and retrieval-based question answering.

Features

· CSV File Import: Allows users to upload and process CSV files directly.

· Structured Data Processing: Converts tabular data into document format for AI workflows.

· Column-Based Extraction: Supports extracting data from a specific column when needed.

· Document Conversion: Transforms CSV content into structured documents.

· Pipeline Compatibility: Works with text splitters and downstream AI processing components.

8)Document Store

Loader used to retrieve documents from an existing document store and make them available for processing in AI workflows.

Setup

• Document Loaders > drag Document Store node • Select the required store from the Select Store dropdown • Ensure the selected store already contains stored documents • Configure additional parameters if required

You can now use the Document Store Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Document Store Loader allows users to access documents that are already stored in a document storage system. Instead of uploading or fetching data from external sources, this loader retrieves previously stored documents and makes them available for further processing in AI pipelines such as embeddings, indexing, or retrieval-based applications.

Features

· Existing Document Access: Retrieves documents already stored in the document store.

· Store Selection: Allows users to choose from available document stores.

· Document Reuse: Enables reuse of previously processed or stored data.

· Workflow Integration: Integrates retrieved documents into AI processing pipelines.

· Efficient Data Management: Helps manage and reuse document datasets efficiently.

9)Custom Document Loader

Loader used to create documents dynamically using custom input variables and a JavaScript function.

Setup

• Document Loaders > drag Custom Document Loader node • Click Input Variables to define the variables that will be used as input • Write the required JavaScript logic in the Javascript Function section • The function should return document objects containing pageContent and optional metadata • Configure additional parameters if required

You can now use the Custom Document Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Custom Document Loader allows users to create documents programmatically using custom input variables and JavaScript logic. Instead of loading data from external sources, this loader lets users define how documents should be generated by writing a function that returns document objects. Each document typically contains pageContent and optional metadata fields such as title or tags. This loader is useful when data needs to be dynamically constructed before entering the AI processing pipeline.

Features

· Custom Document Creation: Allows users to generate documents using custom logic.

· Input Variable Support: Supports dynamic inputs that can be used inside the JavaScript function.

· Flexible Data Processing: Enables transformation and structuring of data before creating documents.

· Metadata Support: Allows adding metadata fields such as titles or tags to documents.

· Workflow Integration: Generated documents can be used in downstream pipelines such as embeddings, indexing, or retrieval systems.

9)Docx File

Loader used to read content from DOCX files and convert the extracted text into documents for AI processing.

Setup

• Document Loaders > drag Docx File node • Upload the DOCX file using the Upload File option • Configure Additional Parameters if required

You can now use the Docx File Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Docx File Loader allows users to upload Microsoft Word documents and extract their textual content. The loader reads the DOCX file and converts the extracted content into structured documents that can be processed by AI pipelines. These documents can then be used for tasks such as embeddings generation, indexing, or retrieval-based question answering.

Features

· DOCX File Import: Allows users to upload and process Microsoft Word documents.

· Text Extraction: Extracts readable text from DOCX files.

· Document Conversion: Converts DOCX content into structured documents for AI workflows.

· Pipeline Compatibility: Works with text splitters and downstream AI components.

11) Epub File Loader

Loader used to read content from EPUB files and convert the extracted text into documents for AI processing.

Setup

• Document Loaders > drag Epub File node • Upload the EPUB file using the Upload File option • Select the Usage option to determine how the content should be divided (for example one document per chapter) • Configure Additional Parameters if required

You can now use the Epub File Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Epub File Loader allows users to upload EPUB files and extract the content contained within them. The loader processes the structure of the EPUB file and converts chapters or sections into documents depending on the selected usage configuration. These documents can then be used in AI workflows such as knowledge base creation, embeddings generation, and retrieval systems.

Features

· EPUB File Import: Supports uploading and processing EPUB documents.

· Chapter-Based Processing: Allows splitting content into documents per chapter or section.

· Text Extraction: Extracts readable content from EPUB files.

· Document Conversion: Converts EPUB content into structured documents for AI pipelines.

· Workflow Integration: Works with text splitters and other downstream components.

12) Figma Loader

Loader used to retrieve content from Figma files and convert design data into documents for AI processing.

Setup

• Document Loaders > drag Figma node • Connect Credential > click Create New • Provide the Figma API credentials • Enter the File Key of the Figma file • Optionally provide Node IDs to retrieve specific components from the design • Enable Recursive if nested nodes need to be included • Configure Additional Parameters if required

You can now use the Figma Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Figma Loader allows users to retrieve content and metadata from Figma design files using the Figma API. It extracts information such as text elements and design structure from the specified file or nodes. The retrieved content is converted into structured documents which can then be used in AI workflows such as documentation generation, knowledge base creation, or retrieval-based systems.

Features

· Figma Integration: Connects directly to Figma using API credentials.

· Design Content Extraction: Retrieves text and structure from Figma design files.

· Node-Based Retrieval: Allows fetching specific nodes or components from the design.

· Recursive Extraction: Supports retrieving nested design elements.

· Document Conversion: Converts design content into structured documents for AI pipelines.

Uploaded image

Here are the next three Document Loader entries written exactly in the same style as your GitBook page (Setup → Connections → Explanation → Features). You can copy-paste directly.


13) File Loader

Loader used to upload and read file content and convert it into documents for AI processing.

Setup

• Document Loaders > drag File Loader node • Upload the required file using the Upload File option • Configure Additional Parameters if required

You can now use the File Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The File Loader allows users to upload files directly into the workflow and extract their content. The loader reads the uploaded file and converts its content into structured documents that can be processed in AI pipelines. These documents can then be used for embeddings generation, indexing, or retrieval-based question answering systems.

Features

· File Upload Support: Allows users to upload files directly into the workflow. · Content Extraction: Extracts readable text from uploaded files. · Document Conversion: Converts file content into structured documents. · Pipeline Compatibility: Works with text splitters and downstream AI components.

14) FireCrawl Loader

Loader used to crawl web content using the FireCrawl API and convert the extracted content into documents.

Setup

• Document Loaders > drag FireCrawl node • Connect FireCrawl API > click Create New • Provide the FireCrawl API Key • Select the Type of operation such as Crawl • Enter the URLs from which content should be retrieved • Optionally provide a Query to refine the data extraction • Configure Additional Parameters if required

You can now use the FireCrawl Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The FireCrawl Loader enables users to retrieve website content using the FireCrawl crawling service. It connects to the FireCrawl API and extracts content from the specified URLs. The retrieved content is then converted into structured documents that can be processed by AI pipelines for tasks such as embeddings, indexing, or knowledge base creation.

Features

· Web Crawling: Extracts content from specified URLs. · FireCrawl Integration: Connects to FireCrawl using API credentials. · Query-Based Filtering: Allows refining content extraction using queries. · Document Conversion: Converts crawled content into structured documents. · Workflow Integration: Works with text splitters and downstream AI components.

15) Folder with Files Loader

Loader used to read multiple files from a folder and convert them into documents for AI processing.

Setup

• Document Loaders > drag Folder with Files node • Enter the Folder Path containing the files to be processed • Enable Recursive if files inside subfolders should also be included • Configure Additional Parameters if required

You can now use the Folder with Files Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Folder with Files Loader allows users to load multiple files from a specified folder and convert them into documents. Instead of uploading files individually, the loader scans the folder path and processes all files found within it. This is useful when working with large document collections stored locally.

Features

· Bulk File Processing: Loads and processes multiple files from a folder. · Recursive Loading: Supports loading files from subfolders when enabled. · Automated Document Creation: Converts file contents into structured documents. · Efficient Data Handling: Simplifies processing of large local datasets. · Pipeline Compatibility: Works with text splitters and downstream AI components.

Here are the next three loaders written exactly in the same documentation pattern you are using in GitBook (Setup → Connections → Explanation → Features). You can copy-paste directly.

16) GitBook Loader

Loader used to retrieve content from GitBook documentation pages and convert them into documents for AI processing.

Setup

• Document Loaders > drag GitBook node • Enter the Web Path of the GitBook documentation site • Enable Should Load All Paths if all pages under the documentation should be retrieved • Configure Additional Parameters if required

You can now use the GitBook Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The GitBook Loader allows users to retrieve documentation content directly from GitBook sites. It reads the provided GitBook path and extracts the text content from documentation pages. The extracted content is converted into structured documents that can be processed by AI workflows such as knowledge base creation, embeddings generation, or retrieval-based question answering.

Features

· GitBook Integration: Retrieves documentation directly from GitBook sites. · Documentation Crawling: Extracts text content from documentation pages. · Multi-Page Retrieval: Can load all pages from the provided GitBook path. · Document Conversion: Converts documentation into structured documents. · Workflow Compatibility: Works with text splitters and downstream AI components.

17) GitHub Loader

Loader used to retrieve files and repository content from GitHub and convert them into documents for AI processing.

Setup

• Document Loaders > drag GitHub node • Connect Credential > click Create New • Provide the GitHub access credentials • Enter the Repository Link of the GitHub project • Specify the Branch from which files should be retrieved • Enable Recursive if files in subdirectories should also be loaded • Configure Additional Parameters if required

You can now use the GitHub Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The GitHub Loader allows users to retrieve files and documentation stored in GitHub repositories. It connects to the specified repository and loads files from the selected branch. The retrieved content is converted into structured documents which can be used in AI pipelines such as code analysis, documentation indexing, or retrieval-based applications.

Features

· GitHub Repository Integration: Retrieves files directly from GitHub repositories. · Branch-Based Retrieval: Allows loading files from a specific branch. · Recursive Loading: Supports retrieving files from nested folders. · Document Conversion: Converts repository files into structured documents. · Workflow Integration: Works with text splitters and AI processing pipelines.

18) Google Drive Loader

Loader used to retrieve files from Google Drive and convert their content into documents for AI processing.

Setup

• Document Loaders > drag Google Drive node • Connect Credential > click Create New • Provide the Google Drive API credentials • Select the files or provide the Folder ID from which files should be retrieved • Choose the required File Types such as Google Docs, PDF files, or text files • Enable Include Subfolders if files inside nested folders should also be processed • Enable Include Shared Drives if shared drive files should be included • Set the Max Files limit if required • Configure Additional Parameters if required

You can now use the Google Drive Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Google Drive Loader allows users to retrieve documents stored in Google Drive. It connects to the Google Drive API and loads files from the selected folder or files list. The loader supports multiple file types and converts the retrieved content into structured documents that can be used in AI workflows such as embeddings generation, indexing, or knowledge base creation.

Features

· Google Drive Integration: Retrieves files directly from Google Drive. · Multi-Format Support: Supports Google Docs, PDFs, text files, spreadsheets, and presentations. · Folder-Based Retrieval: Allows loading files from specific folders. · Subfolder Support: Can include files inside nested folders. · Document Conversion: Converts file content into structured documents for AI pipelines.

Here are the next three loaders written exactly in the same format you are using in GitBook so you can copy-paste directly.

19) Google Sheets Loader

Loader used to retrieve spreadsheet data from Google Sheets and convert it into documents for AI processing.

Setup

• Document Loaders > drag Google Sheets node • Connect Credential > click Create New • Provide the Google Sheets API credentials • Select the Spreadsheet from the Select Spreadsheet dropdown • Enter Sheet Names if specific sheets need to be retrieved • Optionally define the Range of cells to load • Enable Include Headers if column headers should be included • Select the Value Render Option if required • Configure Additional Parameters if required

You can now use the Google Sheets Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Google Sheets Loader allows users to retrieve tabular data stored in Google Sheets. It connects to the Google Sheets API and loads spreadsheet data from selected sheets and ranges. The retrieved spreadsheet content is then converted into structured documents that can be used in AI pipelines such as embeddings generation, indexing, or retrieval-based applications.

Features

· Google Sheets Integration: Retrieves spreadsheet data directly from Google Sheets. · Sheet-Level Retrieval: Allows loading data from specific sheets. · Range-Based Extraction: Supports retrieving data from defined cell ranges. · Header Support: Optionally includes column headers when loading data. · Document Conversion: Converts spreadsheet content into structured documents for AI workflows.

20) Image File Loader

Loader used to upload image files and convert extracted content into documents for AI processing.

Setup

• Document Loaders > drag Image File node • Upload the image using the Upload File option • Configure Additional Parameters if required

You can now use the Image File Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Image File Loader allows users to upload image files and process them as documents. The loader reads the image and extracts available textual or descriptive content from it, converting the information into structured documents that can be used in AI pipelines such as indexing, embeddings generation, or retrieval workflows.

Features

· Image Upload Support: Allows users to upload image files directly. · Content Extraction: Processes image data to extract relevant information. · Document Conversion: Converts extracted image information into structured documents. · Workflow Compatibility: Works with text splitters and downstream AI processing components.

21) Jira Loader

Loader used to retrieve issues and project data from Jira and convert them into documents for AI processing.

Setup

• Document Loaders > drag Jira node • Connect Credential > click Create New • Provide the Jira API credentials • Enter the Jira Host URL • Enter the Project Key of the Jira project • Define the Limit per request if required • Optionally specify Created After to retrieve recent issues • Configure Additional Parameters if required

You can now use the Jira Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Jira Loader allows users to retrieve issues and project information from Jira. It connects to the Jira API and loads issues from the specified project. The retrieved issue data is converted into structured documents which can be used in AI workflows such as project analytics, knowledge base generation, or retrieval-based applications.

Features

· Jira Integration: Connects directly to Jira using API credentials. · Project-Based Retrieval: Retrieves issues from specific Jira projects. · Issue Filtering: Supports filtering issues using parameters such as creation date. · Document Conversion: Converts Jira issue data into structured documents. · Workflow Integration: Works with text splitters and AI processing pipelines.

22) Json File Loader

Loader used to read data from JSON files and convert the extracted content into documents for AI processing.

Setup

• Document Loaders > drag Json File node • Upload the JSON file using the Upload File option • Optionally specify Pointers Extraction to retrieve specific fields from the JSON structure • Configure Additional Parameters if required

You can now use the Json File Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Json File Loader allows users to upload JSON files and extract structured data from them. It reads the JSON structure and converts the selected fields or the entire content into documents. These documents can then be used in AI workflows such as embeddings generation, indexing, or retrieval-based applications.

Features

· JSON File Import: Allows users to upload and process JSON files. · Structured Data Extraction: Extracts data from JSON structures. · Pointer-Based Retrieval: Supports extracting specific keys or nested fields. · Document Conversion: Converts JSON data into structured documents. · Workflow Integration: Works with text splitters and AI processing pipelines.

23) Json Lines File Loader

Loader used to read JSON Lines (.jsonl) files and convert each JSON entry into documents for AI processing.

Setup

• Document Loaders > drag Json Lines File node • Upload the JSON Lines file using the Upload File option • Provide the Pointer Extraction key to identify which field should be extracted • Configure Additional Parameters if required

You can now use the Json Lines File Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Json Lines File Loader processes JSON Lines files where each line contains a separate JSON object. It extracts the specified field using pointer extraction and converts each entry into structured documents. These documents can then be used in AI workflows such as indexing, embeddings generation, or retrieval-based systems.

Features

· JSON Lines Support: Processes files where each line is a separate JSON object. · Pointer Extraction: Allows extracting specific fields from each JSON entry. · Document Generation: Converts each JSON entry into a document. · Structured Data Processing: Handles large datasets efficiently. · Pipeline Compatibility: Works with text splitters and downstream AI components.

24) Microsoft Excel Loader

Loader used to read spreadsheet data from Excel files and convert it into documents for AI processing.

Setup

• Document Loaders > drag Microsoft Excel node • Upload the Excel file using the Upload File option • Configure Additional Parameters if required

You can now use the Microsoft Excel Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Microsoft Excel Loader allows users to upload Excel spreadsheets and extract their tabular data. The loader reads the spreadsheet content and converts rows or cell data into structured documents. These documents can then be used in AI pipelines such as embeddings generation, indexing, or retrieval-based workflows.

Features

· Excel File Import: Allows users to upload Excel spreadsheets. · Tabular Data Extraction: Extracts rows and cell data from Excel sheets. · Document Conversion: Converts spreadsheet content into structured documents. · Workflow Integration: Works with text splitters and AI processing pipelines. · Data Processing Support: Handles structured spreadsheet datasets efficiently.

Here are the next three loaders written in the same format as your GitBook documentation, so you can copy-paste directly.

25) Microsoft PowerPoint Loader

Loader used to read content from PowerPoint presentations and convert the extracted slide content into documents for AI processing.

Setup

• Document Loaders > drag Microsoft PowerPoint node • Upload the PowerPoint file using the Upload File option • Configure Additional Parameters if required

You can now use the Microsoft PowerPoint Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Microsoft PowerPoint Loader allows users to upload PowerPoint presentations and extract text from slides. The loader processes the uploaded presentation and converts slide content into structured documents. These documents can then be used in AI workflows such as embeddings generation, indexing, or retrieval-based question answering.

Features

· PowerPoint File Import: Allows users to upload PowerPoint presentations. · Slide Content Extraction: Extracts text content from presentation slides. · Document Conversion: Converts slide data into structured documents for AI workflows. · Pipeline Compatibility: Works with text splitters and downstream AI components.

26) Microsoft Word Loader

Loader used to read content from Microsoft Word documents and convert the extracted text into documents for AI processing.

Setup

• Document Loaders > drag Microsoft Word node • Upload the Word file using the Upload File option • Configure Additional Parameters if required

You can now use the Microsoft Word Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Microsoft Word Loader allows users to upload Word documents and extract their textual content. The loader reads the uploaded file and converts the extracted content into structured documents that can be processed in AI pipelines such as embeddings generation, indexing, or retrieval-based systems.

Features

· Word Document Import: Allows users to upload Microsoft Word files. · Text Extraction: Extracts readable text from Word documents. · Document Conversion: Converts Word document content into structured documents. · Workflow Integration: Works with text splitters and AI processing pipelines.

27) Notion Database Loader

Loader used to retrieve content from Notion databases and convert the retrieved data into documents for AI processing.

Setup

• Document Loaders > drag Notion Database node • Connect Credential > click Create New • Provide the Notion API credentials • Enter the Notion Database ID from which records should be retrieved • Configure Additional Parameters if required

You can now use the Notion Database Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Notion Database Loader allows users to retrieve structured content from Notion databases. It connects to the Notion API and extracts records from the specified database. The retrieved data is converted into structured documents that can be used in AI workflows such as knowledge base creation, embeddings generation, and retrieval-based applications.

Features

· Notion Integration: Connects directly to Notion using API credentials. · Database Retrieval: Retrieves records from specified Notion databases. · Structured Data Extraction: Extracts content stored in database fields. · Document Conversion: Converts Notion database records into structured documents. · Pipeline Compatibility: Works with text splitters and downstream AI components.

Here are the next three written in the same GitBook documentation style you have been using so you can copy-paste directly.

28) Notion Folder Loader

Loader used to retrieve content from a Notion folder and convert the pages inside it into documents for AI processing.

Setup

• Document Loaders > drag Notion Folder node • Enter the Notion Folder path containing the pages to be retrieved • Configure Additional Parameters if required

You can now use the Notion Folder Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Notion Folder Loader allows users to load multiple pages stored inside a Notion folder. It retrieves the content of each page within the specified folder and converts the extracted data into structured documents. These documents can then be used in AI pipelines such as embeddings generation, indexing, or retrieval-based applications.

Features

· Folder-Based Retrieval: Loads multiple pages stored inside a Notion folder. · Structured Content Extraction: Extracts text content from Notion pages. · Bulk Document Loading: Processes multiple pages from a single folder path. · Document Conversion: Converts Notion content into structured documents. · Workflow Compatibility: Works with text splitters and downstream AI components.

29) Notion Page Loader

Loader used to retrieve content from a specific Notion page and convert it into documents for AI processing.

Setup

• Document Loaders > drag Notion Page node • Connect Credential > click Create New • Provide the Notion API credentials • Enter the Notion Page ID from which the content should be retrieved • Configure Additional Parameters if required

You can now use the Notion Page Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Notion Page Loader allows users to retrieve content from a specific Notion page. It connects to the Notion API and extracts the page content, converting it into structured documents. These documents can then be used in AI workflows such as knowledge base creation, embeddings generation, and retrieval-based question answering.

Features

· Notion API Integration: Connects directly to Notion using API credentials. · Page-Level Retrieval: Retrieves content from a specific Notion page. · Structured Content Extraction: Extracts text and structured data from Notion pages. · Document Conversion: Converts Notion page content into structured documents. · Workflow Integration: Works with text splitters and AI processing pipelines.

30) PDF File Loader

Loader used to read content from PDF files and convert the extracted text into documents for AI processing.

Setup

• Document Loaders > drag PDF File node • Upload the PDF file using the Upload File option • Select the Usage option such as one document per page • Configure Additional Parameters if required

You can now use the PDF File Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The PDF File Loader allows users to upload PDF documents and extract the text contained within them. The loader processes the uploaded file and converts its content into structured documents. Depending on the selected usage option, the content can be divided by page or processed as a single document. These documents can then be used in AI pipelines such as embeddings generation, indexing, or retrieval-based systems.

Features

· PDF File Import: Allows users to upload and process PDF documents. · Page-Based Processing: Supports splitting content into documents per page. · Text Extraction: Extracts readable text from PDF files. · Document Conversion: Converts PDF content into structured documents. · Pipeline Compatibility: Works with text splitters and downstream AI components.

31) Plain Text Loader

Loader used to input plain text content and convert it into documents for AI processing.

Setup

• Document Loaders > drag Plain Text node • Enter the text content in the Text field • Configure Additional Parameters if required

You can now use the Plain Text Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Plain Text Loader allows users to manually input text directly into the workflow. The entered content is converted into structured documents that can be processed in AI pipelines. This loader is useful when users want to quickly test workflows or provide small text inputs without uploading files.

Features

· Direct Text Input: Allows users to manually enter text content. · Instant Document Creation: Converts the entered text into documents. · Quick Testing: Useful for testing AI workflows without external files. · Workflow Compatibility: Works with text splitters and downstream AI components.

32) Playwright Web Scraper

Loader used to scrape website content using the Playwright browser automation framework and convert the extracted data into documents.

Setup

• Document Loaders > drag Playwright Web Scraper node • Enter the URL of the website to be scraped • Use Manage Links if multiple pages need to be scraped • Configure Additional Parameters if required

You can now use the Playwright Web Scraper node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Playwright Web Scraper allows users to extract website content using the Playwright browser automation framework. It loads the specified webpage, renders dynamic content if required, and retrieves the text from the page. The extracted content is converted into structured documents that can be processed in AI workflows such as embeddings generation or knowledge base creation.

Features

· Browser-Based Crawling: Uses Playwright to load and process web pages. · Dynamic Content Support: Can retrieve content from websites that use JavaScript rendering. · Multi-Page Scraping: Supports scraping multiple URLs through the Manage Links option. · Document Conversion: Converts scraped content into structured documents. · Workflow Integration: Works with text splitters and downstream AI components.

33) Puppeteer Web Scraper

Loader used to scrape website content using the Puppeteer browser automation library and convert the extracted data into documents.

Setup

• Document Loaders > drag Puppeteer Web Scraper node • Enter the URL of the website to be scraped • Use Manage Links if multiple pages need to be scraped • Configure Additional Parameters if required

You can now use the Puppeteer Web Scraper node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Puppeteer Web Scraper allows users to extract website content using the Puppeteer headless browser automation library. It loads the specified webpage, processes the HTML structure, and extracts text content from the page. The retrieved content is converted into structured documents that can be used in AI pipelines such as indexing, embeddings generation, and retrieval-based systems.

Features

· Headless Browser Scraping: Uses Puppeteer to scrape website content. · Dynamic Page Rendering: Supports websites that load content dynamically. · Multi-Page Scraping: Allows scraping multiple pages using the Manage Links option. · Document Conversion: Converts webpage content into structured documents. · Pipeline Compatibility: Works with text splitters and downstream AI processing components.

34) S3 Directory Loader

Loader used to retrieve multiple files from an Amazon S3 bucket directory and convert them into documents for AI processing.

Setup

• Document Loaders > drag S3 Directory node • Provide the AWS Credential required to access the S3 bucket • Enter the Bucket name from which files should be retrieved • Select the Region where the S3 bucket is hosted • Optionally specify the Server URL if using a custom S3 endpoint • Optionally define a Prefix to load files from a specific folder inside the bucket • Configure Additional Parameters if required

You can now use the S3 Directory Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The S3 Directory Loader allows users to retrieve multiple files stored inside a directory of an Amazon S3 bucket. It connects to AWS using the provided credentials and loads files from the specified bucket and prefix. The retrieved files are converted into structured documents which can then be used in AI workflows such as embeddings generation, indexing, or knowledge base creation.

Features

· AWS S3 Integration: Connects to Amazon S3 using AWS credentials. · Directory-Based Retrieval: Loads files from a specified folder inside the bucket. · Bulk File Processing: Supports processing multiple files stored in the bucket. · Document Conversion: Converts file content into structured documents. · Workflow Compatibility: Works with text splitters and downstream AI components.

35) S3 Loader

Loader used to retrieve a specific file from an Amazon S3 bucket and convert its content into documents for AI processing.

Setup

• Document Loaders > drag S3 node • Provide the AWS Credential required to access the S3 bucket • Enter the Bucket name containing the file • Provide the Object Key of the file to be retrieved • Select the Region where the S3 bucket is hosted • Select the File Processing Method for how the file should be processed • Configure Additional Parameters if required

You can now use the S3 Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The S3 Loader allows users to retrieve individual files stored in Amazon S3 buckets. It connects to the AWS S3 service and downloads the specified file using the object key. The file content is then processed and converted into structured documents that can be used in AI pipelines such as embeddings generation, indexing, or retrieval-based systems.

Features

· AWS S3 Integration: Connects directly to Amazon S3 using AWS credentials. · File-Level Retrieval: Retrieves specific files using the object key. · Flexible File Processing: Supports different file processing methods. · Document Conversion: Converts file content into structured documents. · Workflow Integration: Works with text splitters and downstream AI components.

36) SearchApi Web Search Loader

Loader used to retrieve web search results using SearchApi and convert the results into documents for AI processing.

Setup

• Document Loaders > drag SearchApi for Web Search node • Connect Credential > click Create New • Provide the SearchApi API credentials • Enter the search Query to retrieve web results • Configure Additional Parameters if required

You can now use the SearchApi Web Search Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The SearchApi Web Search Loader allows users to retrieve web search results using the SearchApi service. It sends the specified query to the search API and retrieves relevant web results. The returned data is converted into structured documents that can be used in AI workflows such as knowledge retrieval, research automation, or question answering systems.

Features

· Web Search Integration: Retrieves web results using SearchApi. · Query-Based Retrieval: Allows users to search for information using custom queries. · Real-Time Information Access: Provides up-to-date web search results. · Document Conversion: Converts search results into structured documents. · Workflow Compatibility: Works with text splitters and downstream AI components.

37) SerpApi Web Search Loader

Loader used to retrieve web search results using SerpApi and convert them into documents for AI processing.

Setup

• Document Loaders > drag SerpApi for Web Search node • Connect Credential > click Create New • Provide the SerpApi API credentials • Enter the search Query to retrieve web results • Configure Additional Parameters if required

You can now use the SerpApi Web Search Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The SerpApi Web Search Loader allows users to retrieve search results from various search engines using the SerpApi service. It sends the provided query to the API and retrieves relevant search results. The retrieved data is converted into structured documents which can be used in AI workflows such as research, indexing, or retrieval-based applications.

Features

· Search Engine Integration: Retrieves results from search engines through SerpApi. · Query-Based Retrieval: Allows searching information using custom queries. · Real-Time Web Data: Provides up-to-date web search results. · Document Conversion: Converts search results into structured documents. · Workflow Compatibility: Works with text splitters and downstream AI components.

38) Spider Document Loader

Loader used to scrape website content using the Spider API and convert the extracted data into documents for AI processing.

Setup

• Document Loaders > drag Spider Document Loader node • Connect Credential > click Create New • Provide the Spider API credentials • Select the Mode such as Scrape • Enter the Web Page URL from which content should be retrieved • Set the Limit to define how many pages or results should be retrieved • Configure Additional Parameters if required

You can now use the Spider Document Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Spider Document Loader allows users to retrieve web content using the Spider crawling service. It connects to the Spider API and extracts content from the specified webpage or site. The retrieved content is then converted into structured documents that can be processed by AI pipelines such as embeddings generation, indexing, or knowledge base creation.

Features

· Web Scraping Support: Extracts content from specified web pages. · Spider API Integration: Connects to the Spider service using API credentials. · Configurable Crawling Mode: Supports scraping modes such as page scraping. · Document Conversion: Converts scraped content into structured documents. · Pipeline Integration: Works with text splitters and downstream AI workflows.

39) Text File Loader

Loader used to read content from text files and convert the extracted text into documents for AI processing.

Setup

• Document Loaders > drag Text File node • Upload the text file using the Upload File option • Configure Additional Parameters if required

You can now use the Text File Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Text File Loader allows users to upload plain text files and extract their content. The loader reads the uploaded file and converts the text into structured documents. These documents can then be used in AI workflows such as embeddings generation, indexing, or retrieval-based question answering.

Features

· Text File Import: Allows users to upload and process plain text files. · Content Extraction: Extracts readable text from text files. · Document Conversion: Converts text file content into structured documents. · Workflow Compatibility: Works with text splitters and downstream AI components.

Here are the final three written in the same documentation format you have used for all previous loaders, so you can copy-paste directly into GitBook.

40) Unstructured File Loader

Loader used to process files using the Unstructured API and convert the extracted content into documents for AI processing.

Setup

• Document Loaders > drag Unstructured File Loader node • Connect Credential > click Create New • Provide the credentials required to access the Unstructured API • Upload the file using the Upload File option • Enter the Unstructured API URL if required • Configure Additional Parameters if needed

You can now use the Unstructured File Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Unstructured File Loader allows users to upload files and process them using the Unstructured document processing service. The loader sends the file to the configured Unstructured API endpoint, extracts structured text and metadata, and converts the processed output into documents. These documents can then be used in AI workflows such as embeddings generation, indexing, or retrieval-based systems.

Features

· Unstructured API Integration: Processes files using the Unstructured document processing service. · Multi-Format Support: Supports processing of different document formats. · Structured Content Extraction: Extracts readable text and metadata from uploaded files. · Document Conversion: Converts processed data into structured documents. · Workflow Compatibility: Works with text splitters and downstream AI components.

41) Unstructured Folder Loader

Loader used to process multiple files from a folder using the Unstructured API and convert them into documents for AI processing.

Setup

• Document Loaders > drag Unstructured Folder Loader node • Connect Credential > click Create New • Provide the credentials required to access the Unstructured API • Enter the Folder Path containing the files to be processed • Enter the Unstructured API URL if required • Configure Additional Parameters if needed

You can now use the Unstructured Folder Loader node in THub.

Connections

• Text Splitter can be connected with any node under Text Splitter category

Explanation

The Unstructured Folder Loader allows users to process multiple files stored inside a folder using the Unstructured API. It scans the specified folder path, sends each file to the Unstructured processing service, and converts the extracted content into structured documents. These documents can then be used in AI pipelines such as knowledge base creation, embeddings generation, or document indexing.

Features

· Bulk File Processing: Processes multiple files stored inside a folder. · Unstructured API Integration: Uses the Unstructured service for document processing. · Structured Content Extraction: Extracts text and metadata from files. · Document Conversion: Converts processed file content into structured documents. · Pipeline Compatibility: Works with text splitters and downstream AI workflows.

42) VectorStore to Document Loader

Loader used to retrieve stored data from a vector store and convert the retrieved results into documents.

Setup

• Document Loaders > drag VectorStore To Document node • Select the Vector Store from which documents should be retrieved • Enter the Query used to search the vector store • Define the Minimum Score (%) to filter relevant results • Configure Additional Parameters if required

You can now use the VectorStore to Document Loader node in THub.

Connections

• Vector Store can be connected with any node under Vector Store category

Explanation

The VectorStore to Document Loader retrieves stored entries from a vector store based on a query and converts the retrieved results into documents. It performs similarity search using the query and returns documents that meet the defined minimum score threshold. These documents can then be used in downstream AI workflows such as retrieval-augmented generation or contextual question answering.

Features

· Vector Store Retrieval: Retrieves documents stored in a vector database. · Query-Based Search: Uses similarity search to find relevant entries. · Score Filtering: Filters results based on a minimum similarity score. · Document Conversion: Converts retrieved vector store results into structured documents. · AI Workflow Integration: Supports retrieval-augmented generation and knowledge retrieval systems.

Last updated