Architecture¶

Overview¶

ContextOS is a pipeline. A repository path enters one end; a context pack exits the other. Each stage is a pure transformation — no side effects, no network calls, no mutation of the source repo.

repo path
    │
    ▼
┌──────────┐
│  Scanner │  walks files, respects .gitignore, classifies by language
└──────────┘
    │ list[FileResult]
    ▼
┌─────────────────┐
│ Summarizer      │  per-file metadata: language, size, token estimate
└─────────────────┘
    │ file_summaries.json
    ▼
┌──────────────────┐
│ DependencyGraph  │  static import analysis → edges between files
└──────────────────┘
    │ dependency_graph.json
    ▼
┌──────────────────┐
│ ContextSelector  │  task + budget → ranked FileResult list
└──────────────────┘
    │ ContextSelection
    ▼
┌──────────────────┐
│ SecretDetector   │  scan content → redact [REDACTED_*] tokens
└──────────────────┘
    │ redacted ContextSelection
    ▼
┌──────────────────┐
│ PackBuilder      │  render Markdown or JSON context pack
└──────────────────┘
    │ rendered string
    ▼
┌─────────────────────┐
│ Compression (opt.)  │  Headroom proxy → compressed string
└─────────────────────┘
    │
    ▼
context pack (file on disk)

Package Layout¶

contextos/
├── cli/
│   ├── main.py                 Typer app; registers sub-commands
│   └── commands/
│       ├── init.py             contextos init
│       ├── scan.py             contextos scan
│       ├── task.py             contextos task
│       ├── pack.py             contextos pack
│       ├── export.py           contextos export {claude,codex,cursor,aider}
│       └── memory.py           contextos memory {add,decision,list,compact}
│
├── core/
│   ├── scanner.py              File walking, binary detection, language classification
│   ├── summarizer.py           Per-file summaries → file_summaries.json
│   ├── dependency_graph.py     Import edge extraction → dependency_graph.json
│   ├── context_selector.py     Relevance ranking + budget enforcement
│   ├── pack_builder.py         Markdown/JSON rendering + disk write
│   ├── secret_detector.py      14-pattern regex engine for secret redaction
│   ├── compression.py          CompressionProvider ABC + factory
│   ├── headroom_adapter.py     HeadroomCompressionProvider (lazy import)
│   ├── initializer.py          .contextos/ directory setup
│   ├── repo_index.py           PROJECT_INDEX.md generation
│   ├── safety.py               Read-only enforcement utilities
│   └── token_counter.py        tiktoken with regex fallback
│
└── exporters/
    ├── base.py                 build_export() pipeline + render_context()
    ├── claude.py               CLAUDE_CONTEXT.md
    ├── codex.py                CODEX_CONTEXT.md
    ├── cursor.py               CURSOR_CONTEXT.md
    └── aider.py                AIDER_CONTEXT.md

Key Data Models¶

`FileResult` (context_selector.py)¶

@dataclass
class FileResult:
    rel_path: str       # relative path from repo root
    kind: str           # "full" | "summary"
    content: str        # file content or summary text
    score: float        # relevance score 0.0–1.0
    tokens: int         # estimated token count
    reasons: list[str]  # why this file was selected

`ContextSelection` (context_selector.py)¶

@dataclass
class ContextSelection:
    selected: list[FileResult]   # files within budget, ranked by score
    excluded: list[str]          # rel_paths that didn't fit
    budget: int                  # token budget passed in
    used_tokens: int             # actual tokens used
    secret_warnings: list[str]   # "path:line [secret:pattern] snippet"

`SelectionConfig` (context_selector.py)¶

@dataclass
class SelectionConfig:
    budget: int = 8000
    no_source: bool = False       # summaries only
    allow_sensitive: bool = False # skip secret redaction

`PackConfig` (pack_builder.py)¶

@dataclass
class PackConfig:
    budget: int = 8000
    include_tests: bool = True
    no_source: bool = False
    fmt: str = "md"               # "md" | "json"
    add_timestamp: bool = True
    allow_sensitive: bool = False
    compress: str | None = None   # "headroom" | None

`ExportConfig` (exporters/base.py)¶

@dataclass
class ExportConfig:
    budget: int = 8000
    include_tests: bool = True
    no_source: bool = False
    add_timestamp: bool = True
    allow_sensitive: bool = False

Context Selection¶

_select() in context_selector.py implements a two-pass algorithm:

Pass 1 — Scoring. Each file receives a relevance score combining: - Keyword overlap between task description and file path/summary - Import graph centrality (files imported by many others score higher) - Recency signal (files mentioned in CURRENT_TASK.md or MEMORY.md)

Pass 2 — Budget enforcement. _enforce_budget() greedily fills the budget: 1. Try embedding the full file content. If it fits, add it as kind="full". 2. If it doesn't fit, try embedding just the summary. If that fits, add it as kind="summary". 3. If even the summary doesn't fit, exclude the file.

Secret redaction happens inside _enforce_budget() — content is scanned and redacted before token counting, so the budget reflects the redacted size.

Secret Detection¶

secret_detector.py implements 14 regex patterns covering:

Category	Patterns
AI keys	OpenAI (`sk-`), Anthropic (`sk-ant-`)
Cloud	AWS access key ID (`AKIA...`), AWS secret key
VCS	GitHub classic PATs (`ghp_`, `gho_`, `ghu_`, `ghs_`), fine-grained (`github_pat_`)
Auth	JWTs (`eyJ...`), PEM private keys, Bearer tokens
Services	Slack (`xoxb-`), Stripe live/restricted keys
Config	Database URLs with passwords, env-style `KEY=value` assignments

Filename exclusion runs before content scanning. Files matching .env, id_rsa, *.pem, credentials.*, passwords.*, etc. are excluded from context packs entirely. .env.example and .env.sample are explicitly safe.

Value-preserving redaction for key=value patterns: only the value is replaced, so DATABASE_PASSWORD=[REDACTED_SECRET] preserves the variable name.

Compression¶

The CompressionProvider ABC defines a single method:

def compress(self, text: str, *, budget: int) -> str: ...

NoOpCompressionProvider returns text unchanged (default).

HeadroomCompressionProvider lazy-imports headroom_ai and delegates to the local proxy at http://127.0.0.1:8787 (or HEADROOM_BASE_URL). If the package is missing or the proxy is unreachable, it raises HeadroomUnavailableError with setup instructions.

Compression runs after rendering — the full rendered Markdown is compressed, then written to disk.

Exporter Pipeline¶

All four exporters (claude, codex, cursor, aider) share the same pipeline via build_export() in exporters/base.py:

_ensure_scan() → _load_summaries_safe() → _select() → _load_text_safe() → render() → write()

Each exporter provides: - FILENAME — output filename (e.g. CLAUDE_CONTEXT.md) - TOOL_NAME — display name - USAGE_NOTE — how to load the file in the target tool - _INSTRUCTIONS — tool-specific agent instructions appended to the pack - render() — thin wrapper around render_context() from base.py

Design Constraints¶

Determinism. Same inputs → same output. File ordering is always sorted. Scores are deterministic given the same task string. No random seeds, no timestamps in selection logic.

No optional imports at module level. tiktoken, headroom_ai are lazy-imported inside functions with graceful fallbacks. A missing optional dependency never prevents ContextOS from running.

Read-only. ContextOS never writes outside .contextos/ directories. Source files are opened in read mode only.

No LLM calls by default. All analysis is static. The compression step (Headroom) is explicitly opt-in and still uses a local model — no external API calls.