Security & Privacy
For IT managers, CTOs, and procurement teams evaluating document automation for sensitive environments. This page describes exactly what happens to your files — without marketing language.
Every conversion — whether triggered manually, via API, or by a folder watch — follows the same path.
You authorize Vlkea Parse to access a specific folder in Google Drive, OneDrive, or Dropbox using OAuth. You choose which folder is watched and where converted files are written. We request the minimum scope required — we do not request broad account access.
When a conversion runs, your file is downloaded to a RAM-backed temporary filesystem (tmpfs). It never touches disk storage. In production, this is enforced at the configuration level — the service refuses to start if the temporary directory is not on a memory filesystem.
The file's binary signature (magic bytes) is checked against its declared type. Files that don't match are rejected before any conversion attempt. File extensions are not trusted.
For DOCX, HTML, EPUB, RTF, ODT, and similar formats: conversion runs in an isolated subprocess. For PDFs: a dedicated GPU processing service is used — explained in detail below.
The resulting Markdown is sanitized before any write-back, removing patterns that could cause issues downstream in your pipeline.
The Markdown file is written to the output folder you specified in your cloud storage. The in-memory buffer is released. Nothing is retained on our systems. A cleanup task runs every 10 minutes to remove any orphaned temporary files left by interrupted jobs.
We store operational metadata to power conversion history and enforce quotas. Document content is never stored.
We store
We never store
This section requires special attention for security-conscious evaluators.
And exactly what that service sees.
PDFs cannot be accurately converted using traditional text extraction. Most real-world PDFs — contracts, financial reports, clinical documents — store text as rendered glyphs, not as machine-readable characters. The page is effectively an image. Accurate structural extraction requires a vision model that can read the page the same way a human would.
When you convert a PDF, here is exactly what happens:
We support on-premise deployment. The full conversion pipeline — including the vision model for PDFs — runs within your own infrastructure. Documents are processed on your servers, on your hardware. No data leaves your network at any stage. Get in touch to discuss.
Vlkea Parse uses OAuth to connect to your cloud storage. Here is the precise scope of that access.
Read access to list and download files from the folder(s) you select. Write access to create output Markdown files in the folder you choose. We request the minimum scope required — we do not request broad account access.
Our application code only reads from and writes to the folder(s) you select — we never query outside them. That said, the OAuth token itself grants broader access than just your chosen folder (this is a limitation of how Google Drive and OneDrive OAuth works — no folder-level scope exists). You can see exactly what access was granted and revoke it at any time from your Google, Microsoft, or Dropbox account security settings — independently of anything we say.
OAuth access and refresh tokens are encrypted at rest using envelope encryption — a unique encryption key is generated per token. Tokens are never stored in plain text.
Disconnect directly from your Google, Microsoft, or Dropbox account security settings — or from within Vlkea Parse settings. Revocation is immediate. We cannot read or write to your storage after that point.
Technical facts for reviewers.
Content validated by magic bytes (binary signature inspection). File extensions are not trusted. Files that don't match their declared type are rejected before processing begins.
Each document is converted in a separate subprocess. A crash or failure in one conversion cannot affect others. PDF processing runs in a completely separate service.
Error messages use generic codes only. No file content appears in application logs, error responses, or monitoring reports. Error tracking is configured with PII disabled.
Converted Markdown is sanitized before being written to your cloud storage. Malicious patterns introduced by document content cannot propagate to output files.
All state-changing requests require a CSRF token. Exempt only: Bearer API calls and health check endpoints.
Dashboard sessions use short-lived JWT tokens (15-minute TTL) with automatic renewal. REST API uses bcrypt-hashed API keys. MCP integrations use OAuth 2.1 with PKCE.
No legal boilerplate. Here is what we actually do with your data.
If your organization's policy requires that documents never leave your own infrastructure — whether for regulatory compliance, data sovereignty, or internal security requirements — we support full on-premise deployment.
Questions? hello@vlkea.dev