whatsoRAG - design your ingest, extract, and RAG pipeline

Made by Meagan McKeever

What is possible

Drag a piece onto the canvas.

Sources

File

PDF

Photo

Audio

Video

Web page

Spreadsheet

Identify

Detect language

Multimodal

Categorize

Identify fields

Identify source

Identify topic

Identify origin

Scan contents

Read

Parse

Convert

Screenshot

Research further

Extract txt

Extract table

OCR

Vision

Describe

Transcribe

Freeze frame

Make searchable

Chunk

Overlap

Embed

Vectorize

Index

Summary tree

Resolve

Dedup

Normalize

Entity

Relationship

Canonical

Define

Name

Enrich

Bounding box

Fingerprint

Flag gaps

Cross reference

Stores

Vector database

Relational database

Graph database

Hypergraph

Hierarchy

Order aware

Episodic memory

Semantic memory

Retrieve

Translate query

Embed query

Route

Look up

Path expansion

Fusion

Hyperedge filtration

Hops

Constraints

Priority queue

Score and prune

Semantic similarity

Temporal coherence

Spatial overlap

Structural importance

Weighted simulation

Context segment extraction

Dedup hits

Cross-encoder rerank

Provenance enrichment

Ranked answer to bot

Query and output

Query

Rerank

Dashboard

List

Render

Output json

Output md

Output answer

Output translate

Output encrypted

RAG type

RAG

GraphRAG

HypergraphRAG

RAGAnything

RAPTOR

HippoRAG

Made by Meagan McKeever

Sources

Identify

Read

Make searchable

Resolve

Stores

Retrieve

Query

RAG

React Flow

Drag pieces from the left to start building your pipeline.
Connect them by dragging from one edge to another.

Reality check

Add pieces and this will tell you if it works.

Recommended retrieval

Hybrid RAG: dense + keyword (BM25) + rerank, with contextual chunking

high confidence

Mostly text and semantic search. The strong, simple baseline: hybrid retrieval plus reranking, with contextual chunking, beats naive vector-only and is cheaper than a graph.

Why (from your canvas)

text-first, semantic search

Alternatives

RAPTOR (if documents are long and questions need the big picture)
GraphRAG / HippoRAG (if you add entities + relationships)
Vision-native (if documents are visually rich)

Always-good baseline

Hybrid retrieval: dense (semantic) + keyword (BM25), then rerank.
Contextual or late chunking so passages keep their context.
Matryoshka embeddings so you can trade accuracy for speed/storage.
Keep a source trail (provenance) on every fact for citations.

Tuning (mock)

Chunk size: 600 tokensOverlap: 12%Embed size: 1536 dims

Rerank

Over time

Day one

Slide forward: more doc types and sources arrive. Recall grows, cost and storage grow, outcomes expand.

Effect on the output

Precision93

Recall83

Speed40

Cost85

Storage85

What you get

Add a store or a RAG type to see what you get out.

Benefits

Rerank filters out 'looks similar but wrong' results.
Contextual chunking keeps each passage's context, so retrieval misses less.

Everything is brought together into one place to query.