A source is a piece of content the chat plane can search against. Most often a file (PDF, DOCX, plain text, etc.) that has been extracted, chunked, and embedded into a searchable index at ingest time. Two read endpoints expose what's attached to a user or a specific conversation, plus one write endpoint for direct file ingestion.
List sources
GET /v1/sources returns every source the calling user can see: files they own, group-shared sources, and org-level sources surfaced by an admin.
curl -s "https://api.privatemind.com/v1/sources" \
-H "Authorization: Bearer $PMIND_USER_KEY"Response:
{
"success": true,
"total": 3,
"body": [
{
"id": 482,
"source_type": "vectorized",
"source_name": "file_quarterly-report.pdf",
"source_description": "Vectorized file \"quarterly-report.pdf\" containing searchable content",
"handler": "vectorFileHandler",
"source_config": {
"conversation_id": 1207,
"collection_name": "vec_482",
"file_id": 482,
"file_name": "quarterly-report.pdf",
"file_type": "application/pdf",
"file_size": 184302,
"chunks_count": 47,
"stored_at": "2026-05-20T14:02:11.482Z"
},
"conversation_id": 1207,
"is_active": true,
"use_in_tasks": true,
"created_at": "2026-05-20T14:02:11.482Z",
"updated_at": "2026-05-20T14:02:11.482Z"
}
]
}Fields worth knowing:
| Field | Meaning |
|---|---|
source_type |
vectorized for RAG files, tabular for spreadsheets, custom / mcp_web for org-level connectors. |
handler |
Backend handler that runs against the source at retrieval time. vectorFileHandler for vectorized files. |
source_config |
Handler-specific config. For indexed files: the collection name, file metadata, chunk count. Secrets are redacted unless the caller owns the source. |
conversation_id |
The conversation this source row is attached to, or null for an org-level source not bound to any one chat. |
is_active |
Whether the source is currently considered "on" for the conversation it's mapped to. |
use_in_tasks |
Whether agentic task runs should include this source in tool dispatch. |
A single source can be attached to multiple conversations, so it may appear more than once in the list.
List sources on a conversation
GET /v1/conversations/{id}/sources narrows the result to one conversation. Only sources actively mapped to that conversation come back.
curl -s "https://api.privatemind.com/v1/conversations/1207/sources" \
-H "Authorization: Bearer $PMIND_USER_KEY"Same row shape as GET /v1/sources. 404 if the conversation doesn't belong to the calling user.
Ingesting a file
POST /v1/conversations/{id}/files/vectorize takes a single file as multipart form data, extracts its text, chunks it, embeds the chunks, and writes both vectors and a source row in one shot. Pass id=0 (or any non-existent id) and the server will create a new conversation and return its id.
curl -s "https://api.privatemind.com/v1/conversations/0/files/vectorize" \
-H "Authorization: Bearer $PMIND_USER_KEY" \
-F "file=@quarterly-report.pdf"Successful response (201):
{
"success": true,
"message": "File vectorized with 47 chunks successfully",
"total": {
"conversation_id": 1207,
"vectorized_file": {
"id": 482,
"file_name": "quarterly-report.pdf",
"collection_name": "vec_482",
"file_type": "application/pdf",
"file_size": 184302,
"created_at": "2026-05-20T14:02:11.482Z"
}
}
}If the same file (same SHA-256) is uploaded again by the same user, the server skips re-embedding and links the existing source to the new conversation. The response carries total.duplicate: true and the existing file_id.
Form fields
| Field | Required | Notes |
|---|---|---|
file |
yes | Single file part. The multipart key must be exactly file. |
query ?ephemeral=true |
no | Marks the auto-created conversation as ephemeral; purged by the chat-cleanup job. |
Supported file types
Text extracted in-process by the backend before embedding:
- Plain text:
.txt,.md,.markdown,.log,.json,.jsonl,.ndjson,.geojson,.xml,.xhtml,.yaml,.yml,.sql,.py,.js,.ts,.r,.dat - Documents:
.pdf,.docx,.pptx,.odt,.odp,.rtf,.html,.htm
Legacy .doc (OLE binary) is rejected: re-save as .docx. Spreadsheets (.csv, .xlsx, .ods, etc.) and images go through separate /files/tabular and /files/ocr ingestion routes, not documented here.
Size limits
- 75 MB per request (total body size cap).
- 50 MB per file. Files between 50 MB and 75 MB will pass the request check and be rejected with
413 File too large for extraction. - Extracted text above 1,000,000 characters is truncated before embedding; the response message says so.
Failure modes
| Status | When |
|---|---|
400 |
No file part, unsupported file type, or empty/invalid conversation_id segment. |
403 |
Not a user-scoped key, or org has file_attachments_enabled = false. |
413 |
File body exceeds the extractor's 50 MB cap. |
429 |
Per-key rate limit exhausted. |
503 |
The embedding model service is unreachable. Try again. |
How sources reach the model
Sources do not auto-attach to POST /v1/chat/completions. That endpoint only handles completion requests and never reads from the source index. Source-aware retrieval lives in the PrivateMind chat plane itself (the web app and the /v1/conversations/{id}/generate SSE endpoint used by the embeddable widget), where the system retrieves relevant chunks from indexed sources and adds them to the conversation context before calling the model.
Where next
- Authentication: minting a user key from an application key via
/v1/auth/exchange. - Embeddings: the same model that powers ingest, if you want to build your own vector index instead.
- Chat completions: the pass-through completion path that does not read sources.