Providing Images and PDFs#

MTF reads quantitative information from experimental images and PDF documents using Claude’s vision API.

Images (PNG, JPG, GIF, WebP)#

Supported formats: PNG, JPG, GIF, WebP.

For each image, ImageDigestAgent produces a structured digest containing:

  • Plot type and physical description

  • Axis labels, units, and scale (linear / log)

  • All data series as extracted numerical arrays

  • Key quantitative features — peak positions, plateau values, slopes, error bars, fit parameters

  • Any annotations or equations visible in the figure

The digest is stored as MemoryKind.IMAGE_DATA in SharedMemory and is automatically included in the context of every literature, fitting, and reviewer agent. Fitting agents can use plot-extracted data directly, even without registered toolkit arrays.

PDFs#

MTF supports PDF documents (research papers, lab notes, preprints). Pass a PDF the same way as an image:

mtf "Describe phenomenon" --files notes.pdf figure.png

Standard extraction (single pass)#

By default, FileDigestSubagent sends the full PDF to the API and extracts:

  • Document type, title, authors, and summary

  • Physical system studied and key phenomena

  • All central equations reproduced symbolically with symbol definitions

  • Experimental setup, techniques, sample parameters, and calibration details

  • All reported numerical values with units and uncertainties

  • Fitting parameters and critical scales (temperatures, fields, frequencies)

  • Key conclusions and proposed mechanisms

  • A Figure Inventory: every figure, graph, plot, table, and diagram enumerated with page number, caption, and a one-line description

Enhanced extraction (two-pass, default)#

For dense documents (many figures, 10+ pages), MTF runs a second targeted pass using a dedicated figure-extraction prompt. This is enabled by default (config.pdf_enhanced_extraction = True).

Pass 1 — General digest: The full PDF is sent with the standard scientific overview prompt, which now includes a Figure Inventory section listing every figure by page.

Pass 2 — Figure-by-figure extraction: The same PDF is sent again with a focused prompt that iterates page-by-page and for each figure extracts:

  • Caption text (verbatim)

  • Figure type (scatter plot, line graph, heatmap, table, schematic, etc.)

  • All axis labels, units, scale (linear/log), and full numeric range

  • Every data series as extracted numerical arrays: x = [...], y = [...]

  • Key quantitative features: peaks, plateaus, slopes, error bars, fit parameters

  • Physical significance in one sentence

Both pass results are combined into a single structured digest stored in SharedMemory as IMAGE_DATA.

Pass 1 and Pass 2 run in parallel (asyncio.gather()), so the two-pass path adds no wall-clock latency over a hypothetical sequential run. Pass 2 uses pdf_figure_extraction_max_tokens (default 8192) to accommodate dense figure-heavy documents; if a response is still truncated, MTF logs a warning and you can raise this value in MTFConfig.

Disabling enhanced extraction#

Pass --no-enhanced-pdf on the CLI (or set config.pdf_enhanced_extraction = False in Python) to use only the single-pass path. Useful when the PDF is short or contains no figures.

The two-pass path also has a file-size guard: if the PDF is smaller than config.pdf_min_size_kb_for_enhanced (default 200 KB), MTF automatically falls back to single-pass extraction, avoiding the overhead of a second API call for trivially short documents. You can lower or raise this threshold in MTFConfig.

Why no new dependencies?#

Both passes use the same Anthropic messages API call that is already used for images — the PDF is base64-encoded and sent as a "document" content block. No PDF parsing library is required.

CLI#

mtf "Describe phenomenon" --files paper.pdf figure.png

Python API#

report = asyncio.run(
    orchestrator.run("Describe phenomenon", files=["figure1.png"])
)

Interactive mode#

When no --files flag is given, the CLI asks whether you have files to provide before starting the analysis.