Git Sync for Agent Workspaces — Current State¶

Source files: src/git/manager.py, src/orchestrator.py Design principles reference: specs/git/git.md §10

This document captures what the current git sync workflow does well, identifies the design decisions already in place, and serves as a baseline for future improvements to multi-agent workspace synchronization. The strengths documented here are formalized as design invariants in the git spec (§10) and the orchestrator spec (§10, "Design Invariants" table) to ensure they are preserved during refactoring.

What Works Well Today¶

1. Workspace Isolation¶

Each (agent, project) pair gets its own cloned directory, stored at {workspace_dir}/{project_id}/{agent.name}/{repo_name}. This is cached in the agent_workspaces SQLite table so the mapping is stable across restarts.

Why it matters: Two agents working on the same project never touch the same working tree, eliminating an entire class of filesystem-level conflicts (dirty index, mixed staged changes, etc.).

Three source types are handled cleanly:

Source type	Workspace strategy
CLONE	Per-agent clone in workspace dir
LINK	Shared local path (all agents see the same directory)
INIT	Per-agent new repo in workspace dir

2. Branch-per-Task Model¶

Every task gets a unique branch named <task-id>/<slugified-title> (e.g. brave-fox/add-retry-logic). The task ID prefix makes branches trivially traceable back to their originating task, and the slug provides human context.

Why it matters: Agents' work is isolated at the git level, not just the filesystem level. Concurrent agents can work on different branches in their own clones without interfering with each other.

3. Pre-Task Fetch and Pull¶

prepare_for_task() always runs git fetch origin before creating the task branch, ensuring the agent starts from the latest known state of the remote. For normal clones, it also does git pull origin <default_branch> to fast-forward the local default branch.

Worktree-aware branching: When the checkout is a git worktree (detected via _is_worktree()), the code correctly avoids checking out the default branch locally (which would conflict with the main working tree) and instead creates the task branch directly from origin/<default_branch>.

Why it matters: Agents start each task from a reasonably fresh base, reducing the chance that their work diverges too far from the remote.

4. Graceful Error Suppression¶

Git operations that may legitimately fail (no remote configured, no upstream tracking branch, network errors during fetch) are wrapped in try/except GitError: pass blocks. This allows LINK repos with no remote and newly-init'd repos to go through the same code paths as fully-configured CLONE repos.

The outer _prepare_workspace() method wraps all git operations in a catch-all that logs a warning but still returns the correct workspace path. The agent can always start work even if branch setup fails.

Why it matters: The system degrades gracefully rather than failing catastrophically when git operations don't succeed.

5. Post-Completion Commit¶

_complete_workspace() always commits agent work using commit_all(), which:

Runs git add -A to stage everything (including untracked files the agent created).
Checks git diff --cached --quiet to detect whether anything is staged.
Only creates a commit if there are actual changes.

The add-then-check pattern avoids the race condition of checking working-tree status before staging.

Why it matters: Agent work is never silently lost — every modification is captured in a commit before any merge/push/PR logic runs.

6. Plan Subtask Branch Accumulation¶

When a plan generates multiple subtasks, they all share the parent task's branch name. Subtasks use switch_to_branch() (which fetches and pulls) rather than prepare_for_task() (which would create a new branch off default). This lets sequential subtasks accumulate commits on a single branch.

Only the final subtask in a chain triggers the merge-or-PR decision, and it inherits the parent's requires_approval flag.

Why it matters: A multi-step plan produces a single coherent branch with all changes, rather than N separate branches that would each need independent review.

7. Dual Completion Paths (PR vs Direct Merge)¶

The system cleanly supports both:

Tasks requiring approval → push branch + create PR via gh pr create. Task moves to AWAITING_APPROVAL. The orchestrator polls PR status every 60 seconds via gh pr view --json state,mergedAt.
Tasks without approval → merge branch into default + push (CLONE repos) or merge locally only (LINK repos).

Both paths include error handling with user-facing notifications on failure.

Why it matters: Teams that want human review before code lands can use the PR path; solo developers or trusted automation can use direct merge.

8. Merge Conflict Detection¶

merge_branch() attempts the merge and, on failure, runs git merge --abort to restore the working tree. The orchestrator notifies the user with a clear message identifying the conflicting task and branch.

Why it matters: A failed merge never leaves the working tree in a broken state, and the user is told exactly which branch needs manual resolution.

9. Branch Cleanup¶

After a successful merge or PR completion, the system attempts to delete the task branch both locally (git branch -D) and remotely (git push origin --delete). This is best-effort — failures are silently ignored.

Why it matters: Prevents branch proliferation without risking errors if the branch was already cleaned up (e.g. by GitHub's "delete branch after merge" setting).

10. Task Retry Resilience¶

Both prepare_for_task() and switch_to_branch() handle the case where the task branch already exists (e.g. after a crash or restart mid-task). Instead of failing, they switch to the existing branch so work can resume.

Why it matters: The system survives restarts and retries without requiring manual cleanup of stale branches.

11. Approval Polling with Escalation¶

The _check_awaiting_approval() loop handles edge cases thoughtfully:

PR-backed tasks: Polls merge status; transitions to COMPLETED on merge or BLOCKED on close-without-merge (with downstream chain notifications).
Tasks without a PR URL that don't require approval: Auto-completes after a grace period (handles intermediate subtasks that end up in AWAITING_APPROVAL without actually needing review).
Tasks without a PR URL that do require approval: Sends periodic reminders (hourly) and escalates after 24 hours to prevent tasks from rotting silently.

Why it matters: No task gets permanently stuck in AWAITING_APPROVAL without the user being notified.

Summary of Existing Strengths¶

Capability	Implementation
Workspace isolation	Per-agent clone directories, cached in SQLite
Branch isolation	Unique `<task-id>/<slug>` branches per task
Fresh starting point	`git fetch` + `git pull` before each task
Worktree support	Detects worktrees, avoids default-branch checkout conflicts
Graceful degradation	Silent error suppression for optional git operations
Atomic commits	Add-all-then-check-staged pattern in `commit_all()`
Subtask accumulation	Shared branch across plan subtasks with final-step merge
PR workflow	`gh` CLI integration for create + poll + complete
Direct merge workflow	Merge + push with conflict detection and abort
Retry resilience	Existing branches reused on task retry
Stuck task detection	Escalating reminders for approval-blocked tasks

Identified Gaps¶

The following gaps have been identified in the current workflow. Each is labeled G1–G7 for traceability. See specs/git/git.md §11 for the formal gap catalogue with affected code references and violated design principles.

G1. `_merge_and_push` Never Pulls Main Before Merging¶

The direct-merge path executes checkout main → merge branch → push main, but skips pull origin main before the merge. If another agent pushed to main since the workspace's last fetch, the push fails with a non-fast-forward error. The failure is notified but not recovered from — the local main is left with a merge commit that the remote doesn't have.

Scenario: Agent A completes task and pushes to main. Agent B (whose main is behind) tries to merge and push. Agent B's push fails. Agent B's next task starts from a diverged main that includes both the old merge commit and whatever origin/main has moved to.

G2. Push Failures Leave Workspace in a Dirty State¶

After a failed push in _merge_and_push, there is no git reset to undo the local merge. The workspace's main branch contains a merge commit that only exists locally. Subsequent tasks from the same agent inherit this dirty state.

Compounding effect: Each failed push adds another local-only merge commit. Over time the workspace's main diverges further from origin/main, making future merges and pushes increasingly unlikely to succeed.

G3. No Merge Conflict Recovery Strategy¶

When merge_branch() detects conflicts, it aborts the merge and notifies the user. There is no automated attempt to:

Rebase the task branch onto the latest origin/main.
Retry the merge after the rebase.
Create a PR instead (as a fallback for manual resolution).

The agent's completed work is stranded on its task branch until a human resolves the conflict manually.

G4. Retried Tasks Don't Rebase onto Latest Main¶

When prepare_for_task() finds the task branch already exists (retry after crash or failure), it falls back to git checkout <branch_name> without rebasing onto origin/main. The agent resumes work on code that may be many commits behind the current remote state.

This applies to both the normal-clone and worktree paths in prepare_for_task().

G5. No `--force-with-lease` for PR Branch Pushes — RESOLVED¶

~~push_branch() uses git push origin <branch>. On retry — where the branch was already pushed in a previous attempt — the push fails.~~

Resolution: push_branch() now accepts force_with_lease=True, which adds --force-with-lease to the push command. The orchestrator uses this when pushing task branches for PR creation, making retries idempotent while still preventing accidental overwrites of other people's changes.

G6. Subtask Chains Accumulate Drift¶

Plan subtasks share a branch and commit sequentially via switch_to_branch(). While this correctly picks up the previous subtask's commits, it never rebases onto the latest origin/main. Over a long chain (5–10 subtasks), the branch drifts progressively further from main.

Example timeline:

gitGraph
    commit id: "A"
    branch task-branch
    commit id: "X1" tag: "subtask 1"
    commit id: "X2" tag: "subtask 2"
    commit id: "X3" tag: "subtask 3"
    commit id: "X4" tag: "subtask 4"
    checkout main
    commit id: "B"
    commit id: "C"
    commit id: "D"
    commit id: "E"
    commit id: "F"

By the time the final subtask merges, the branch is 5 commits behind main and 4 commits ahead, maximizing conflict surface.

G7. LINK Repos with Shared Filesystem¶

LINK repos use the source path directly as the workspace for all agents:

workspace = repo.source_path    # same for every agent

Without worktrees or per-agent clones, concurrent agents on a LINK repo share the git index, staging area, and working tree. Operations from one agent (e.g. git checkout, git add -A) directly interfere with the other.

Current mitigation: Low probability — most LINK projects are single-agent. But the system does not enforce this constraint, so it is a latent failure mode.

Gap Summary¶

Gap	Severity	Single-Agent Impact	Multi-Agent Impact
G1	High	None (only one pusher)	Push failures on every non-first merge
G2	High	None	Cascading workspace drift after first push failure
G3	Medium	Rare conflicts	Frequent conflicts as main moves fast
G4	Medium	Stale retry	Stale retry with higher conflict risk
G5	~~Low~~	~~Rare~~	RESOLVED — `push_branch` supports `--force-with-lease`
G6	Medium	No drift (only agent)	Branch falls behind during long chains
G7	High	N/A	Filesystem corruption between concurrent agents