Every engineering leader I talk to eventually says some version of this:
"We tried Cursor, Copilot, Claude, or another coding agent on our legacy module. It broke things. The agent does not understand the codebase."
They are usually right about the failure and wrong about the diagnosis.
The problem is rarely that the model is incapable of making code changes. The problem is that legacy code gives the agent no safe way to verify whether the change is correct.
The agent is not failing because the code is old.
It is failing because the code is unprotected.
Why agents fail on legacy code
Coding agents are very good at making targeted changes when they have a feedback loop.
Give an agent a failing test and it can often implement the change quickly. Give it a healthy test suite and it can refactor while staying inside the guardrails.
Now remove the tests.
The agent reads a large service class with hidden side effects. It changes a conditional. It extracts a method. It moves dependencies around. But it has no reliable way to know whether the behavior is still correct. So it guesses.
That is exactly why Michael Feathers' framing still matters: legacy code is code without tests.
The key insight is still the same today:
- before you change behavior, expose seams
- before you refactor, pin current behavior
- before you trust the agent, give it a safety net
Without that safety net, the agent is not refactoring. It is gambling.
What legacy code actually looks like
When people say "legacy module," they usually do not mean a clean old codebase. They mean something like this:
@Service
public class OrderProcessingService {
@PersistenceContext
private EntityManager entityManager;
@Autowired
private RestTemplate restTemplate;
@Autowired
private RedisTemplate<String, Object> redisTemplate;
@Value("${pricing.api.url}")
private String pricingApiUrl;
@Value("${reports.output.dir}")
private String reportsDir;
@Autowired
private NotificationClient notificationClient;
@Transactional
public OrderResult processOrder(OrderRequest request) {
Customer customer = entityManager.createQuery(
"SELECT c FROM Customer c WHERE c.id = :id", Customer.class)
.setParameter("id", request.getCustomerId())
.getSingleResult();
PricingRules rules = (PricingRules) redisTemplate
.opsForHash().get("pricing:" + customer.getTier(), "rules");
if (rules == null) {
rules = restTemplate.getForObject(
pricingApiUrl + "/rules?tier=" + customer.getTier(),
PricingRules.class);
redisTemplate.opsForHash().put("pricing:" + customer.getTier(),
"rules", rules);
}
BigDecimal discount = BigDecimal.ZERO;
if (customer.getTier().equals("PLATINUM")) {
discount = request.getTotal().multiply(new BigDecimal("0.15"));
} else if (customer.getTier().equals("GOLD")) {
discount = request.getTotal().multiply(new BigDecimal("0.10"));
} else if (customer.getTier().equals("SILVER")) {
discount = request.getTotal().multiply(new BigDecimal("0.05"));
}
if (request.getCouponCode() != null) {
discount = discount.add(new BigDecimal("20.00"));
}
BigDecimal adjusted = rules.getMarkup().add(request.getTotal());
BigDecimal finalAmount = adjusted.subtract(discount);
Order order = new Order();
order.setCustomerId(customer.getId());
order.setOriginalTotal(request.getTotal());
order.setDiscount(discount);
order.setFinalAmount(finalAmount);
order.setStatus("PROCESSED");
entityManager.persist(order);
try {
String reportPath = reportsDir + "/order_" + order.getId() + ".csv";
Files.write(Paths.get(reportPath),
("total,discount,final\n" +
request.getTotal() + "," + discount + "," + finalAmount)
.getBytes());
} catch (IOException e) {
// swallowed
}
if (finalAmount.compareTo(new BigDecimal("1000")) > 0) {
notificationClient.sendWebhook(
"https://hooks.slack.com/services/T0EXAMPLE/B0EXAMPLE/000000000000",
"Large order: " + order.getId());
}
return new OrderResult(order.getId(), finalAmount, discount);
}
}
This one method touches:
- the database
- the cache
- an external API
- the filesystem
- a downstream notification client
- business logic
- configuration
That is not one problem. That is several problems collapsed into one method.
If an agent changes this file without tests, it is making changes against invisible side effects. That is where trust collapses.
The method that actually works
Here is the sequence that works better.
1. Map the dependency graph first
Before the agent writes code, make it map the surface area:
Analyze OrderProcessingService.java. For each method, list:
- external dependencies
- internal calls
- configuration usage
- likely seam points for extraction
Output as a markdown table.
The goal is not perfect architecture analysis. The goal is to identify where the code touches the outside world and where pure logic might be extracted safely.
This step is still human-led. The agent can trace mechanics. The engineer decides which seams matter.
2. Write characterization tests before refactoring
Most teams try to jump straight to unit tests. That usually fails because the code was never designed for clean unit boundaries.
What works better is characterization testing: capture current behavior first, even the ugly parts.
@SpringBootTest
@Transactional
class OrderProcessingServiceCharacterizationTest {
@Autowired OrderProcessingService service;
@Autowired EntityManager entityManager;
@Test void goldCustomer_noCoupon_getsTenPercent() {
Customer customer = new Customer();
customer.setId(1L);
customer.setTier("GOLD");
entityManager.persist(customer);
OrderRequest request = new OrderRequest(1L, new BigDecimal("100.00"), null);
OrderResult result = service.processOrder(request);
assertThat(result.getDiscount()).isEqualByComparingTo("10.00");
}
@Test void couponOnSmallOrder_discountExceedsTotal() {
Customer customer = new Customer();
customer.setId(5L);
customer.setTier("SILVER");
entityManager.persist(customer);
OrderRequest request = new OrderRequest(5L, new BigDecimal("10.00"), "SAVE20");
OrderResult result = service.processOrder(request);
// Current buggy behavior, pinned intentionally
assertThat(result.getDiscount()).isEqualByComparingTo("20.50");
}
}
The point is not to bless the bugs. The point is to make current behavior visible so the agent stops changing behavior by accident.
This is where coding agents help a lot. Once you define the scope clearly, they are effective at generating repetitive characterization coverage faster than most humans want to do it by hand.
3. Refactor one seam at a time
Once behavior is pinned, constrain the agent to one change at a time.
Extract the discount calculation logic from processOrder() into a package-private
method called calculateDiscount(String tier, BigDecimal total, String couponCode).
Do not change behavior. Run the characterization tests after the change.
If any test fails, stop and explain why.
That produces a smaller, verifiable step:
BigDecimal calculateDiscount(String tier, BigDecimal total, String couponCode) {
BigDecimal discount = BigDecimal.ZERO;
if (tier.equals("PLATINUM")) {
discount = total.multiply(new BigDecimal("0.15"));
} else if (tier.equals("GOLD")) {
discount = total.multiply(new BigDecimal("0.10"));
} else if (tier.equals("SILVER")) {
discount = total.multiply(new BigDecimal("0.05"));
}
if (couponCode != null) {
discount = discount.add(new BigDecimal("20.00"));
}
return discount;
}
Then move to the next seam. Not ten refactorings in one prompt. One seam, one verification cycle.
4. Extract interfaces around infrastructure
After the logic starts separating, extract the external concerns:
- pricing lookup
- report writing
- notification sending
- repository access
That turns a monolith of mixed concerns into testable components:
public interface PricingService {
PricingRules getRules(String tier);
}
public interface ReportWriter {
void writeOrderReport(Order order);
}
Now the main service becomes smaller, clearer, and more mockable. More importantly, the agent now has cleaner boundaries for future changes.
5. Only then let the agent modernize
Once the safety net exists and the seams are exposed, the agent becomes useful for the higher-leverage cleanup:
- replace brittle conditionals
- fix intentionally identified bugs
- add proper error handling
- isolate side effects
- improve naming and structure
At that point the agent is no longer guessing inside a black box. It is operating inside a checked workflow.
What this looks like in practice
This is not a one-prompt miracle. It is a disciplined sequence:
Phase 1 - Map the surface area
- identify dependencies
- find the seams
- separate pure logic from infrastructure-heavy methods
Phase 2 - Pin current behavior
- write characterization tests
- include current bugs if they are part of real behavior
- make unintended change visible
Phase 3 - Extract and simplify
- pull pure logic into testable units
- isolate external concerns behind interfaces
- reduce the blast radius of each change
Phase 4 - Modernize safely
- replace brittle structures
- fix known bugs intentionally
- improve error handling and boundaries
The exact timeline will vary by codebase. That is not the real point. The real point is that once the code is under test, the agent stops guessing and starts working inside a system that can reject bad changes quickly.
The uncomfortable truth about coding agents
Coding agents are not magic refactoring engines.
They are extremely fast implementation engines that still need:
- constraints
- verification
- small change boundaries
- human review
The teams getting real value from coding agents on legacy code are not the ones using the fanciest prompts. They are the ones applying old, boring, correct engineering discipline - seams, tests, small verified changes - and then letting the agent accelerate the mechanical work.
Michael Feathers' playbook still holds. What changed is that agents make the execution faster once the safety net exists.
If your coding agent cannot refactor your legacy code, do not blame the model first. Ask whether the code has a safety net.
At CoEdify, we use coding agents the same way we use any serious engineering tool: inside a checked workflow. On legacy systems, the difference between a useful agent and a dangerous one is almost always the quality of the safety net around it. [coedify.com]