Sources · PrivateMind Docs

A source is a piece of content the chat plane can search against. Most often a file (PDF, DOCX, plain text, etc.) that has been extracted, chunked, and embedded into a searchable index at ingest time. Two read endpoints expose what's attached to a user or a specific conversation, plus one write endpoint for direct file ingestion.

List sources

GET /v1/sources returns every source the calling user can see: files they own, group-shared sources, and org-level sources surfaced by an admin.

cURL

curl -s "https://api.privatemind.com/v1/sources" \
  -H "Authorization: Bearer $PMIND_USER_KEY"

Response:

JSON

{
  "success": true,
  "total": 3,
  "body": [
    {
      "id": 482,
      "source_type": "vectorized",
      "source_name": "file_quarterly-report.pdf",
      "source_description": "File \"quarterly-report.pdf\"",
      "handler": "vectorFileHandler",
      "source_config": {
        "conversation_id": 1207,
        "collection_name": "vec_482",
        "file_id": 482,
        "file_name": "quarterly-report.pdf",
        "file_type": "application/pdf",
        "file_size": 184302,
        "chunks_count": 47,
        "stored_at": "2026-05-20T14:02:11.482Z"
      },
      "conversation_id": 1207,
      "is_active": true,
      "use_in_tasks": true,
      "created_at": "2026-05-20T14:02:11.482Z",
      "updated_at": "2026-05-20T14:02:11.482Z"
    }
  ]
}

Fields worth knowing:

Field	Meaning
`source_type`	`vectorized` for RAG files, `tabular` for spreadsheets, `custom` / `mcp_web` for org-level connectors.
`handler`	Backend handler that runs against the source at retrieval time. `vectorFileHandler` for vectorized files.
`source_config`	Handler-specific config. For indexed files: the collection name, file metadata, chunk count. Secrets are redacted unless the caller owns the source.
`conversation_id`	The conversation this source row is attached to, or `null` for an org-level source not bound to any one chat.
`is_active`	Whether the source is currently considered "on" for the conversation it's mapped to.
`use_in_tasks`	Whether agentic task runs should include this source in tool dispatch.

A single source can be attached to multiple conversations, so it may appear more than once in the list.

List sources on a conversation

GET /v1/conversations/{id}/sources narrows the result to one conversation. Only sources actively mapped to that conversation come back.

cURL

curl -s "https://api.privatemind.com/v1/conversations/1207/sources" \
  -H "Authorization: Bearer $PMIND_USER_KEY"

Same row shape as GET /v1/sources. 404 if the conversation doesn't belong to the calling user.

Ingesting a file

POST /v1/conversations/{id}/files/vectorize takes a single file as multipart form data, extracts its text, chunks it, embeds the chunks, and writes both vectors and a source row in one shot. Pass id=0 (or any non-existent id) and the server will create a new conversation and return its id.

cURL

curl -s "https://api.privatemind.com/v1/conversations/0/files/vectorize" \
  -H "Authorization: Bearer $PMIND_USER_KEY" \
  -F "file=@quarterly-report.pdf"

Successful response (201):

JSON

{
  "success": true,
  "message": "File vectorized with 47 chunks successfully",
  "total": {
    "conversation_id": 1207,
    "vectorized_file": {
      "id": 482,
      "file_name": "quarterly-report.pdf",
      "collection_name": "vec_482",
      "file_type": "application/pdf",
      "file_size": 184302,
      "created_at": "2026-05-20T14:02:11.482Z"
    }
  }
}

If the same file (same SHA-256) is uploaded again by the same user, the server skips re-embedding and links the existing source to the new conversation. The response carries total.duplicate: true and the existing file_id.

Form fields

Field	Required	Notes
`file`	yes	Single file part. The multipart key must be exactly `file`.
query `?ephemeral=true`	no	Marks the auto-created conversation as ephemeral; purged by the chat-cleanup job.

Supported file types

Text extracted in-process by the backend before embedding:

Plain text: .txt, .md, .markdown, .log, .json, .jsonl, .ndjson, .geojson, .xml, .xhtml, .yaml, .yml, .sql, .py, .js, .ts, .r, .dat
Documents: .pdf, .docx, .pptx, .odt, .odp, .rtf, .html, .htm

Legacy .doc (OLE binary) is rejected: re-save as .docx. Spreadsheets (.csv, .xlsx, .ods, etc.) and images go through separate /files/tabular and /files/ocr ingestion routes, not documented here.

Size limits

75 MB per request (total body size cap).
50 MB per file. Files between 50 MB and 75 MB will pass the request check and be rejected with 413 File too large for extraction.
Extracted text above 1,000,000 characters is truncated before embedding; the response message says so.

Failure modes

Status	When
`400`	No `file` part, unsupported file type, or empty/invalid `conversation_id` segment.
`403`	Not a user-scoped key, or org has `file_attachments_enabled = false`.
`413`	File body exceeds the extractor's 50 MB cap.
`429`	Per-key rate limit exhausted.
`503`	The embedding model service is unreachable. Try again.

How sources reach the model

Sources do not auto-attach to POST /v1/chat/completions. That endpoint only handles completion requests and never reads from the source index. Source-aware retrieval lives in the PrivateMind chat plane itself (the web app and the /v1/conversations/{id}/generate SSE endpoint used by the embeddable widget), where the system retrieves relevant chunks from indexed sources and adds them to the conversation context before calling the model.

Where next

Authentication: minting a user key from an application key via /v1/auth/exchange.
Embeddings: the same model that powers ingest, if you want to build your own vector index instead.
Chat completions: the pass-through completion path that does not read sources.