Git Sync for Agent Workspaces — Identified Gaps¶
Source files: src/git/manager.py, src/orchestrator.py
Companion doc: Current State
This document catalogues the gaps in the current git sync workflow that can cause merge conflicts, failed pushes, and stale code when multiple agents work concurrently on the same repository. Each gap includes the relevant code locations, a description of the failure mode, and the impact on the system.
Gap 1 — _merge_and_push() Never Pulls Main Before Merging¶
Files: src/orchestrator.py (_merge_and_push, line 866),
src/git/manager.py (merge_branch, line 162)
What happens today¶
_merge_and_push()
└─ merge_branch()
├─ git checkout main
├─ git merge <task-branch> ← merges into *local* main
└─ (no fetch or pull of origin/main)
└─ push_branch(main) ← may fail with non-fast-forward
merge_branch() checks out the local main and merges the task branch into
it, then _merge_and_push() pushes main to origin. There is no
git fetch origin or git pull origin main before the merge.
Failure mode¶
If another agent (or a human) pushed to origin/main since this workspace last
pulled, the local main is behind the remote. The merge succeeds locally but
the subsequent push fails with a non-fast-forward error:
The user is notified, but the merge is not reverted.
Impact¶
- Frequency: Increases linearly with the number of concurrent agents. With 3+ agents completing tasks within minutes of each other, this is virtually guaranteed.
- Severity: High — the push failure blocks code from reaching the remote, and leaves the workspace in a diverged state (see Gap 2).
Gap 2 — Push Failures Leave Workspace in a Dirty State¶
File: src/orchestrator.py (_merge_and_push, lines 877–884)
What happens today¶
if repo.source_type == RepoSourceType.CLONE:
try:
self.git.push_branch(workspace, repo.default_branch)
except Exception as e:
await self._notify_channel(...)
# ← local main still has the merge commit; no rollback
When the push to origin/main fails, the error is caught and the user is
notified, but the local main branch retains the merge commit that could not
be pushed. No git reset or rollback is performed.
Failure mode¶
The next task assigned to this agent inherits a local main that has diverged
from origin/main. prepare_for_task() will git pull origin main at the
start, but this pull itself may fail or produce a merge commit, compounding the
divergence. Subsequent merges and pushes from this workspace become
increasingly likely to fail.
Impact¶
- Cascading: One push failure poisons the workspace for all future tasks.
- Hard to diagnose: The user sees repeated push failures from the same agent without an obvious root cause.
- Recovery: Currently requires manual
git reset --hard origin/mainin the workspace.
Gap 3 — No Merge Conflict Recovery Strategy¶
Files: src/git/manager.py (merge_branch, lines 162–173),
src/orchestrator.py (_merge_and_push, lines 868–875)
What happens today¶
def merge_branch(self, checkout_path, branch_name, default_branch="main"):
self._run(["checkout", default_branch], cwd=checkout_path)
try:
self._run(["merge", branch_name], cwd=checkout_path)
return True
except GitError:
self._run(["merge", "--abort"], cwd=checkout_path)
return False # ← caller notifies user, no retry
When a merge conflict is detected, the merge is aborted and the user is notified with a "manual resolution needed" message. There is no automated attempt to rebase the task branch onto the latest main, no retry loop, and no escalation path beyond the single notification.
Failure mode¶
Many merge conflicts are trivially resolvable — for example, when two
agents edited different files but both touched a lockfile or auto-generated
file. A rebase onto updated main would resolve these without human
intervention. Instead, the task's code changes are stranded on an unmerged
branch until a human acts.
Impact¶
- Operational burden: Every merge conflict requires manual intervention, even when the conflict is trivial.
- Delayed delivery: Code that is otherwise correct sits unmerged until a human notices the notification and resolves the conflict.
- Scales poorly: With N agents, the probability of at least one merge conflict per cycle grows quickly.
Gap 4 — Retried Tasks Don't Rebase onto Latest Main¶
File: src/git/manager.py (prepare_for_task, lines 86–128)
What happens today¶
# Normal repo path (line 116-128):
try:
self._run(["checkout", "-b", branch_name], cwd=checkout_path)
except GitError:
# Branch already exists (e.g. task retried after restart) —
# switch to it instead of failing.
self._run(["checkout", branch_name], cwd=checkout_path)
# Worktree path (line 107-115):
try:
self._run(["checkout", "-b", branch_name, f"origin/{default_branch}"], ...)
except GitError:
self._run(["checkout", branch_name], cwd=checkout_path)
When a task retries and the branch already exists, both code paths fall back to
a bare git checkout <branch>. The branch is not rebased onto the latest
origin/main, even though git fetch origin has already been run (line 105).
Failure mode¶
A task fails and is retried hours later. In the meantime, other agents have
pushed changes to main. The retried task resumes work on a branch that was
forked from a now-stale version of main. The agent may:
- Work with outdated code (e.g. calling a function that was renamed).
- Produce changes that conflict with work that has landed since.
- Fail at merge time due to accumulated drift.
Impact¶
- Stale code: The agent works against an outdated view of the repository.
- Wasted compute: The agent may complete work that cannot merge cleanly.
- Increases with retry delay: The longer between the original attempt and the retry, the more stale the branch becomes.
Gap 5 — No --force-with-lease for PR Branch Pushes — RESOLVED¶
Resolution: push_branch() now accepts a force_with_lease keyword argument.
When True, --force-with-lease is added to the push command. The orchestrator's
_create_pr_for_task() passes force_with_lease=True when pushing task branches
for PR creation, making retries idempotent while preventing accidental overwrites
of other people's changes.
Previously: push_branch used a plain git push origin <branch> without
--force-with-lease, causing retry failures when the branch had already been
pushed in a previous attempt.
Gap 6 — Subtask Chains Accumulate Drift from Main¶
Files: src/git/manager.py (switch_to_branch, lines 130–157),
src/orchestrator.py (_prepare_workspace, line 759)
What happens today¶
Plan subtasks share a branch and commit sequentially. Each subtask calls
switch_to_branch(), which:
def switch_to_branch(self, checkout_path, branch_name):
self._run(["fetch", "origin"], cwd=checkout_path)
self._run(["checkout", branch_name], cwd=checkout_path)
self._run(["pull", "origin", branch_name], cwd=checkout_path)
# ← no rebase onto origin/main
The branch is fetched and pulled (to pick up the previous subtask's commits),
but it is never rebased onto origin/main. Over a chain of N subtasks,
the branch drifts further from main with each step.
Failure mode¶
A plan with 5–10 subtasks takes 30–60 minutes to complete. During that time,
other agents may push multiple changes to main. By the time the final subtask
attempts to merge, the shared branch has diverged significantly from main,
causing:
- Merge conflicts that would not have occurred if the branch had been periodically rebased.
- The agent working with stale dependencies, APIs, or generated code.
- Larger, harder-to-review diffs at PR time.
Impact¶
- Conflict probability scales with chain length: Longer plans → more drift → higher conflict rate.
- Late discovery: Conflicts are only detected at the very end of the chain, after all subtasks have completed. All the subtask work may need to be redone.
- No incremental feedback: Intermediate subtasks have no signal that their base is drifting.
Gap 7 — LINK Repos with Shared Filesystem (No Agent Isolation)¶
File: src/orchestrator.py (_compute_workspace_path, lines 655–677)
What happens today¶
def _compute_workspace_path(self, agent, project_id, repo):
if repo.source_type == RepoSourceType.LINK:
return repo.source_path # ← same path for ALL agents
...
For LINK repos, every agent is given the same workspace_path. Unlike CLONE
repos (which get per-agent directories), LINK repos share a single filesystem
directory. There is no file-level locking, no worktree isolation, and no
mechanism to prevent concurrent access.
Failure mode¶
Two agents assigned tasks on the same LINK repo will:
- Clobber each other's branch state: Agent A checks out branch-A, then Agent B checks out branch-B. Agent A's working tree now has branch-B's files — any subsequent commit by Agent A goes to the wrong branch.
- Corrupt the index: Concurrent
git add -A+git commitoperations can produce corrupt or mixed commits. - Race on checkout:
prepare_for_task()doescheckout main → pull → checkout -b <branch>. If two agents interleave these steps, one may create its branch from the other's task branch rather than from main.
Impact¶
- Data loss: Agent work can be committed to the wrong branch or lost entirely.
- Silent corruption: The system won't detect that agents are interfering with each other — commits simply end up in unexpected places.
- Workaround exists but isn't used:
GitManageralready hascreate_worktree()/remove_worktree()methods, but the workspace preparation code never uses them for LINK repos.
Summary of Gaps¶
| # | Gap | Root Cause | Severity |
|---|---|---|---|
| 1 | No pull before merge in _merge_and_push() |
Missing fetch + pull before merge_branch() |
High |
| 2 | Push failure leaves diverged local main | No rollback after failed push | High |
| 3 | No merge conflict recovery | Abort + notify only, no rebase/retry | Medium |
| 4 | Retried tasks work on stale branches | No rebase on retry, just checkout |
Medium |
| 5 | ~~PR branch push lacks --force-with-lease~~ |
~~Plain git push without safety flags~~ |
RESOLVED |
| 6 | Subtask chains drift from main | No periodic rebase during subtask chain | Medium |
| 7 | LINK repos share filesystem across agents | _compute_workspace_path returns same path |
High |
Priority Assessment¶
Must fix for reliable multi-agent operation (Gaps 1, 2, 7): These gaps cause outright failures or data corruption in the most common multi-agent scenario (several agents completing tasks on the same repo within a short window).
Should fix for operational efficiency (Gaps 3, 4, 5): These gaps cause avoidable manual intervention and wasted agent compute. They become more painful as the number of agents and task throughput increases.
Should fix for long-running plans (Gap 6): This gap specifically affects plan-based workflows with many subtasks. It's less urgent if most plans have ≤3 subtasks, but becomes critical for longer chains.