Shared Memory#

SharedMemory (mtf/memory.py) is a plain Python object — a list of MemoryEntry objects — passed by reference to every phase, agent, and the debate engine. It is the single shared state accumulator for the entire pipeline run.


Data model#

class MemoryEntry:
    kind: MemoryKind      # canonical tag (see table below)
    content: str          # the stored text
    metadata: dict        # arbitrary key-value pairs (agent_id, phase, filename, …)

SharedMemory.add(kind, content, **metadata) appends a new entry. SharedMemory.filter(*kinds) returns entries matching any of the specified kinds. SharedMemory.format_context(*kinds) produces the prompt-ready context block (see below).


MemoryKind values#

Kind

Written by

What it contains

IMAGE_DATA

ImageDigestAgent

Quantitative digest of one image/PDF, or the cross-file synthesis when multiple files are provided. Each entry carries source_file and filename metadata.

LITERATURE

LiteratureAgent

Full structured report from one literature agent instance: relevant papers, ranked hypotheses with basis/verification/failure-mode classification, key equations.

DEBATE

DebateEngine

Synthesis produced at the end of each phase’s debate call. Carries phase metadata ("literature", "fitting", or "review").

USER_FEEDBACK

HumanInterface

Free-text guidance entered by the user when rejecting a debate round. Read by literature and fitting agents in subsequent rounds.

HYPOTHESIS

Literature phase

Each approved hypothesis extracted from the literature synthesis (one entry per hypothesis line). Passed to the fitting phase as the list to iterate over.

FIT_RESULT

FittingAgent

The result dict from one fitting agent’s exec() run, formatted as text. Carries agent_id and hypothesis metadata.

REVIEW

ReviewerAgent

Full review report from one reviewer agent instance, including per-hypothesis SUPPORTED/PLAUSIBLE/SPECULATIVE/REJECTED verdicts with check IDs cited.

CONVENTIONS

Literature phase (GPD)

Physics convention defaults for one subfield, returned by GPD subfield_defaults. Carries domain metadata. Locked once per session before the first literature fan-out.

PHYSICS_VERDICT

Fitting phase, literature phase, DebateEngine (all GPD)

Structured check results from GPD verification tools. Written by: _run_phase_physics_checks (checks 5.1 + 5.3 per fit report), _screen_hypothesis_plausibility (limiting-case screen), FittingAgent.fit() (pre-exec convention check), and DebateEngine (dimensional check postscript for fitting/review phases). Injected into every debate synthesis context.

FITTING_WARNINGS

Fitting phase (GPD)

Pre-dispatch pitfall warnings combining domain lookup_pattern results (sign-error, convergence-issue categories) and check_error_classes output per hypothesis. Written by _prefetch_fitting_warnings() before the fitting fan-out. Carries domain and hypothesis metadata.

DOMAIN_PATTERNS

Literature phase (GPD)

Cross-session convention-pitfall patterns pre-fetched before the first literature fan-out via lookup_pattern(category="convention-pitfall"). Carries domain and source metadata.

DOMAIN_CLASSIFICATION

MTFOrchestrator._classify_domains()

Audit trail of auto-detected physics domains when config.auto_detect_domains=True. Records either the detected list or a fallback notice. Informational only — not consumed by agents via extra_kinds.

PROPOSALS

Review phase (run_review_phase)

Synthesized list of proposed new measurements, ordered by discriminating power. Written directly by the review phase after calling DebateEngine.synthesize(store_as_debate=False) — bypasses DEBATE storage to avoid duplicating the text in both kinds. Appended to the final report.

TOOLKIT_DIGEST

ToolBuilderAgent

Summary of data and model items parsed from complex user-supplied input by the tool-builder agent.

QUALITATIVE_EVAL

QualitativeEvaluationAgent

Qualitative hypothesis evaluation report produced when --no-fitting is used. Contains per-hypothesis SUPPORTED/PLAUSIBLE/SPECULATIVE/REJECTED verdicts based on theory and image data, without numerical fitting. Read by ReviewerAgent.

FITTING_SKIPPED

Qualitative phase

Flag entry written when the fitting phase is skipped. Content: "Fitting phase was skipped (--no-fitting). Qualitative evaluation substituted." Read by ReviewerAgent for context.

PHENOMENON

MTFOrchestrator.run()

Original user phenomenon text, written once at run start before any phase. Guards against double-write. Never overwritten.

INTEGRITY_WARNING

FittingAgent.fit()

Fabrication/integrity warnings from post-exec checks in run_fitting_code(): optimizer not called, chi² negative, or parameters empty. Read by ReviewerAgent.


Context injection#

Each concrete agent specifies which MemoryKind values it needs by passing extra_kinds to BaseAgent._query(). Only the requested kinds are included in the context block prepended to the agent’s prompt.

Context format#

SharedMemory.format_context(*kinds) produces:

=== SHARED CONTEXT ===
--- INDEX ---
[1] [USER_FEEDBACK] Increase the temperature range in your search...
[2] [IMAGE_DATA] ## Image Type\nLine graph …
[3] [CONVENTIONS] {"subfield": "condensed_matter", ...
[4] [PHENOMENON] We observe a sharp resistance drop...
--- FULL ENTRIES BELOW ---
[USER_FEEDBACK] Increase the temperature range in your search.
[IMAGE_DATA] ## Image Type\nLine graph …\n## Axes and Units …
[CONVENTIONS] {"subfield": "condensed_matter", "metric_signature": "+---", …}
[PHENOMENON] We observe a sharp resistance drop to zero at 92 K…
=== END CONTEXT ===

The index is prepended automatically when more than 3 entries are present, giving agents a navigable table of contents. _format_index(entries) is the private helper; format_index() is its public thin wrapper.

BaseAgent._build_prompt() prepends this block:

=== SHARED CONTEXT ===
…
=== END CONTEXT ===

Task: Investigate the following experimental phenomenon …

After the context block and task text, _build_prompt() always appends a honesty-enforcement reminder (_HONESTY_REMINDER). When CONVENTIONS entries are present, a convention-lock reminder (_CONVENTION_REMINDER) is also appended.

Which kinds each agent reads#

Agent

extra_kinds passed to _query()

LiteratureAgent.investigate()

USER_FEEDBACK, IMAGE_DATA, CONVENTIONS, DOMAIN_PATTERNS

FittingAgent.identify_needed_toolkit_items()

LITERATURE, DEBATE, IMAGE_DATA, CONVENTIONS, FITTING_WARNINGS, DOMAIN_PATTERNS

FittingAgent.fit()

LITERATURE, DEBATE, USER_FEEDBACK, IMAGE_DATA, CONVENTIONS, FITTING_WARNINGS, DOMAIN_PATTERNS

QualitativeEvaluationAgent.evaluate()

IMAGE_DATA, LITERATURE, DEBATE, USER_FEEDBACK, CONVENTIONS, PHYSICS_VERDICT

ReviewerAgent.review()

LITERATURE, DEBATE, FIT_RESULT, USER_FEEDBACK, IMAGE_DATA, CONVENTIONS, PHYSICS_VERDICT, QUALITATIVE_EVAL, FITTING_SKIPPED, INTEGRITY_WARNING

ProposalAgent.propose()

IMAGE_DATA, LITERATURE, DEBATE, HYPOTHESIS, FIT_RESULT, USER_FEEDBACK, CONVENTIONS, PHYSICS_VERDICT

DebateEngine.synthesize() always calls memory.format_context() with no arguments, receiving all entries regardless of kind, plus it explicitly appends CONVENTIONS and PHYSICS_VERDICT entries to the user content block.

MemoryKind.PHENOMENON is always present in memory after orchestrator start. It is not in any agent’s extra_kinds — agents encounter it only when DebateEngine.synthesize() calls format_context() with no arguments (receiving all kinds), or when an agent’s own extra_kinds happens to cover all kinds. Its primary role is as an audit anchor and context for the debate synthesis, not as a per-agent prompt injection.

Why IMAGE_DATA is included in every agent’s context#

LiteratureAgent, FittingAgent, and ReviewerAgent all include IMAGE_DATA in their extra_kinds. This means extracted numerical data from user-supplied plots (axis values, data series as Python lists, peak positions, slopes, error bars) is automatically visible to every agent that performs analysis — no explicit passing of data is required.


Accumulation order#

Entries accumulate in chronological order within a single pipeline run:

[PHENOMENON]       one entry at run start                  (orchestrator init)
[IMAGE_DATA]       per file, then cross-file synthesis     (phase 0)
[CONVENTIONS]      per domain                              (start of phase 1)
[LITERATURE]       N entries, one per lit agent            (phase 1 round 1 …)
[USER_FEEDBACK]    0 or more, one per rejection            (phase 1)
[DEBATE]           one per round (phase="literature")      (phase 1)
[HYPOTHESIS]       one per approved hypothesis line        (phase 1 approval)
# Fitting path (default):
[FITTING_WARNINGS] 0 or more, per domain × hypothesis     (phase 2 pre-dispatch)
[PHYSICS_VERDICT]  0 or more (convention check pre-exec)  (phase 2 fit)
[FIT_RESULT]       M × N_hypotheses entries               (phase 2)
[INTEGRITY_WARNING] 0 or more (if fabrication detected)   (phase 2 fit)
[PHYSICS_VERDICT]  0 or more (checks 5.1 + 5.3)          (phase 2 post-fit)
[DEBATE]           one (phase="fitting")                  (phase 2)
[PHYSICS_VERDICT]  0 or more (dimensional postscript)     (phase 2 debate)
# Qualitative path (--no-fitting):
[QUALITATIVE_EVAL] N entries, one per eval agent          (phase 2)
[FITTING_SKIPPED]  one flag entry                         (phase 2)
[DEBATE]           one (phase="qualitative")              (phase 2)
[REVIEW]           K entries, one per reviewer            (phase 3)
[PHYSICS_VERDICT]  0 or more (run_check per hypothesis)  (phase 3)
[DEBATE]           one (phase="review")                   (phase 3)
[PHYSICS_VERDICT]  0 or more (dimensional postscript)    (phase 3 debate)
[PROPOSALS]        one (proposal synthesis)              (phase 3)

Thread safety#

SharedMemory contains no locks. It is safe under asyncio’s single-threaded event loop for the main pipeline.

The GUI (StreamlitInterface) runs the orchestrator in a separate daemon thread with its own event loop. Communication between the orchestrator thread and the Streamlit UI thread goes through queue.Queue pairs — the orchestrator thread never reads or writes SharedMemory from within the Streamlit thread, so no cross-thread access occurs.