HyperAgentsHyperAgents
Home
Guide
Examples
GitHub
Home
Guide
Examples
GitHub
  • Guide

    • Introduction
    • Quick start
    • Basic concepts
    • Architecture
    • Limitations

Basic concepts

Deep dive into HyperAgents: agents, the evolutionary loop, archive, evaluation, and execution. Workflow diagrams are included below alongside the narrative so you can read in one place.

Overview

HyperAgents combines evolutionary computation and Quality-Diversity (QD) ideas: keep an archive of agent versions, score each one, and use strong ancestors as parents for the next mutation (MetaAgent edits).

Workflow diagrams

Evolutionary loop (outer)

One generation (sequence)

Use participant id Main (not Loop) — Mermaid reserves loop for control blocks.

TaskAgent vs MetaAgent (programs)

Execution mode

The two agents

TaskAgent — the worker

  • Input: formatted task string (from domain.formatInput).
  • Output: prediction string (and optional structured result).
  • Tools: domain-specific, optional (e.g. calculator, bash).
  • Implementation: src/agent/task_agent.ts.

Behavior is mostly prompt + tools — both can be edited by the MetaAgent.

MetaAgent — the improver

  • Input: repository path, evaluation paths, parent score context.
  • Output: modified files / diffs on disk.
  • Tools: framework bash + editor only.
  • Implementation: src/agent/meta_agent.ts.

The MetaAgent is the mutation operator: it does not solve tasks directly; it rewrites what does.

How they cooperate

The evolutionary loop

Implemented in src/core/generate_loop.ts. Each generation typically:

  1. Select parent from the archive (select_parent.ts).
  2. Set up executor (local or Docker).
  3. Apply lineage — replay patches so the workspace matches the parent.
  4. Run MetaAgent — produce a new patch from failures and context.
  5. Run TaskAgent via harness (staged eval may run first if configured).
  6. Evaluate — domain scores predictions; reports written under output dir.
  7. Update archive — append JSONL snapshot with new genId, scores, patch list.

Configuration sketch

const config: GenerateLoopConfig = {
  domains: [myDomain],
  metaAgent,
  taskAgentFactory: (t) => new TaskAgent({ model, tools: t }),
  tools: getFrameworkTools(),
  outputDir: "./outputs/evolution",
  repoPath: ".",
  maxGenerations: 5,
  executionMode: "local",
  parentSelection: "score_child_prop",
  evalSamples: 10,
};

The archive

The archive is an append-only JSONL file: each line is a full snapshot { archive, entries }. Read the last line for current state.

Entry shape (conceptual)

FieldMeaning
genIdUnique generation id
parentIdParent generation (tree edge)
patchFilesPatch paths in lineage
scoresPer-domain numeric scores
validParentCan future gens use this as parent?
metadataRun metadata (e.g. run_eval)

Lineage is a tree

Branches are normal: parentId points to the actual ancestor, not necessarily the latest id.

Why JSONL?

Appending a line is cheap; you keep history of every snapshot without rewriting a giant JSON file. See also JSONL vs JSON in the table below.

JSONJSONL
StructureOne object per fileOne object per line
AppendRewrite fileAppend line
Latest stateParse allRead last line
Typical use herereport.json, predictions.jsonarchive.jsonl

Parent selection strategies

From src/core/select_parent.ts, chosen once in config for the whole run:

StrategyBehavior
randomUniform over valid parents — max exploration
latestMost recent valid parent — simple chain
bestHighest score — pure exploitation
score_propRandom weighted by score
score_child_propScore-weighted with child penalty (default) — explore under-used parents

Why not always best? You can get stuck in a local maximum; sometimes a weaker parent opens a path to a better global solution.

Child penalty (default strategy) uses: weight = (score + 0.01) × 1 / (1 + numChildren).

Domains and evaluation

A Domain (src/domains/base.ts) defines your benchmark:

  • config — name, splits, score keys, sample counts.
  • loadTasks — async load of DomainTask[].
  • evaluate — score one prediction (usually 0–1).
  • formatInput — task → model prompt.
  • report — aggregate EvalResult[] into summary.

Example domains in the repo include bash, scoring, calculator, factcheck, paper review, and git evolution demos.

Evaluators

src/domains/evaluators.ts provides three patterns:

  1. staticEvaluator — normalized string equality; free and deterministic.
  2. llmJudgeEvaluator — rubric-based model scoring; costs tokens.
  3. humanFeedbackEvaluator — map user ratings to the 0–1 interval.

Pick the one that matches task objectivity and budget.

The harness

src/domains/harness.ts connects TaskAgent to tasks:

Used for one-off evals and inside runGenerateLoop.

Predictions vs scores

ScorePrediction
WhatNumber from 0 to 1Model output string
Typical filesreport.jsonpredictions.json
Used forParent selection, rankingUser-facing output, debugging

ensemble (src/core/ensemble.ts) picks a high-scoring generation and returns its prediction for a given question.

Executors

src/utils/executor.ts — same interface, two modes:

  • Local — temp directory, fastest for development; host must trust generated code.
  • Docker — per-generation container; slower, safer for untrusted codegen.

Output layout (evolution)

Typical tree under outputDir:

outputs/bash_evolution/
├── archive.jsonl
├── gen_initial/metadata.json
├── gen_1/
│   ├── metadata.json
│   ├── agent_output/model_patch.diff
│   └── bash_eval/
│       ├── predictions.json
│       └── report.json
└── gen_2/ ...

Single eval (no loop) may only have predictions.json and report.json.

Self-referential improvement (prompt files)

If HyperAgents is installed from npm, framework TypeScript in node_modules is not what you mutate. Instead, point agents at files in your repo:

const metaAgent = new MetaAgent({ model, promptFile: "./prompts/meta_agent.txt" });
// or
const config: GenerateLoopConfig = {
  // ...
  promptsDir: "./prompts",
};

With promptsDir, the loop can scaffold meta_agent.txt and task_agent.txt. Template placeholders such as {{repoPath}}, {{evalPath}}, {{scoreContext}} are filled at runtime (see main repo docs/concepts.md).

Without promptsDir, built-in templates are used — the MetaAgent still edits your domain code and separate files, but not its packaged default prompt text.

Early termination

  • If best archive score reaches 1.0, the loop stops (no wasted compute).
  • MetaAgent prompt includes score context so it avoids needless edits when already at 100%.

Examples overview

ExampleDemonstratesLoop
scoringPrompt / grading logicManual or demo script
calculatorFixing a buggy toolManual iterations
bashCommand generationeval / evolve
factcheckClassificationeval / evolve
paper_reviewAccept/rejectSingle eval in script
git_evolutionGit-native patchesFull loop

Glossary

TermDefinition
ArchiveJSONL history of generations and scores
DomainTask suite + evaluation for one benchmark
Evaluatorstatic / LLM judge / human scoring helper
ExecutorLocal or Docker workspace for one generation
GenerationOne improve + evaluate cycle
HarnessRuns TaskAgent over domain tasks
MetaAgentEdits code to improve TaskAgent
ParentArchive node used as base for a child
PatchDiff capturing MetaAgent changes
PredictionRaw TaskAgent output for a task
Selection strategyRule for picking the next parent

See also

  • Architecture
  • Limitations
Prev
Quick start
Next
Architecture