Providing Images and PDFs#

MTF reads quantitative information from experimental images and PDF documents using Claude’s vision API.

Images (PNG, JPG, GIF, WebP)#

Supported formats: PNG, JPG, GIF, WebP.

For each image, ImageDigestAgent produces a structured digest containing:

Plot type and physical description
Axis labels, units, and scale (linear / log)
All data series as extracted numerical arrays
Key quantitative features — peak positions, plateau values, slopes, error bars, fit parameters
Any annotations or equations visible in the figure

The digest is stored as MemoryKind.IMAGE_DATA in SharedMemory and is automatically included in the context of every literature, fitting, and reviewer agent. Fitting agents can use plot-extracted data directly, even without registered toolkit arrays.

PDFs#

MTF supports PDF documents (research papers, lab notes, preprints). Pass a PDF the same way as an image:

mtf "Describe phenomenon" --files notes.pdf figure.png

Standard extraction (single pass)#

By default, FileDigestSubagent sends the full PDF to the API and extracts:

Document type, title, authors, and summary
Physical system studied and key phenomena
All central equations reproduced symbolically with symbol definitions
Experimental setup, techniques, sample parameters, and calibration details
All reported numerical values with units and uncertainties
Fitting parameters and critical scales (temperatures, fields, frequencies)
Key conclusions and proposed mechanisms
A Figure Inventory: every figure, graph, plot, table, and diagram enumerated with page number, caption, and a one-line description

Enhanced extraction (two-pass, default)#

For dense documents (many figures, 10+ pages), MTF runs a second targeted pass using a dedicated figure-extraction prompt. This is enabled by default (config.pdf_enhanced_extraction = True).

Pass 1 — General digest: The full PDF is sent with the standard scientific overview prompt, which now includes a Figure Inventory section listing every figure by page.

Pass 2 — Figure-by-figure extraction: The same PDF is sent again with a focused prompt that iterates page-by-page and for each figure extracts:

Caption text (verbatim)
Figure type (scatter plot, line graph, heatmap, table, schematic, etc.)
All axis labels, units, scale (linear/log), and full numeric range
Every data series as extracted numerical arrays: x = [...], y = [...]
Key quantitative features: peaks, plateaus, slopes, error bars, fit parameters
Physical significance in one sentence

Both pass results are combined into a single structured digest stored in SharedMemory as IMAGE_DATA.

Pass 1 and Pass 2 run in parallel (asyncio.gather()), so the two-pass path adds no wall-clock latency over a hypothetical sequential run. Pass 2 uses pdf_figure_extraction_max_tokens (default 8192) to accommodate dense figure-heavy documents; if a response is still truncated, MTF logs a warning and you can raise this value in MTFConfig.

Disabling enhanced extraction#

Pass --no-enhanced-pdf on the CLI (or set config.pdf_enhanced_extraction = False in Python) to use only the single-pass path. Useful when the PDF is short or contains no figures.

The two-pass path also has a file-size guard: if the PDF is smaller than config.pdf_min_size_kb_for_enhanced (default 200 KB), MTF automatically falls back to single-pass extraction, avoiding the overhead of a second API call for trivially short documents. You can lower or raise this threshold in MTFConfig.

Why no new dependencies?#

Both passes use the same Anthropic messages API call that is already used for images — the PDF is base64-encoded and sent as a "document" content block. No PDF parsing library is required.

CLI#

mtf "Describe phenomenon" --files paper.pdf figure.png

Python API#

report = asyncio.run(
    orchestrator.run("Describe phenomenon", files=["figure1.png"])
)

Interactive mode#

When no --files flag is given, the CLI asks whether you have files to provide before starting the analysis.

Providing Images and PDFs

Contents