Your Coding Agent Can't Refactor Legacy Code — Here's Why

Every engineering leader I talk to eventually says some version of this:

"We tried Cursor, Copilot, Claude, or another coding agent on our legacy module. It broke things. The agent does not understand the codebase."

They are usually right about the failure and wrong about the diagnosis.

The problem is rarely that the model is incapable of making code changes. The problem is that legacy code gives the agent no safe way to verify whether the change is correct.

The agent is not failing because the code is old.

It is failing because the code is unprotected.

Why agents fail on legacy code

Coding agents are very good at making targeted changes when they have a feedback loop.

Give an agent a failing test and it can often implement the change quickly. Give it a healthy test suite and it can refactor while staying inside the guardrails.

Now remove the tests.

The agent reads a large service class with hidden side effects. It changes a conditional. It extracts a method. It moves dependencies around. But it has no reliable way to know whether the behavior is still correct. So it guesses.

That is exactly why Michael Feathers' framing still matters: legacy code is code without tests.

The key insight is still the same today:

before you change behavior, expose seams
before you refactor, pin current behavior
before you trust the agent, give it a safety net

Without that safety net, the agent is not refactoring. It is gambling.

What legacy code actually looks like

When people say "legacy module," they usually do not mean a clean old codebase. They mean something like this:

@Service
public class OrderProcessingService {

    @PersistenceContext
    private EntityManager entityManager;

    @Autowired
    private RestTemplate restTemplate;

    @Autowired
    private RedisTemplate<String, Object> redisTemplate;

    @Value("${pricing.api.url}")
    private String pricingApiUrl;

    @Value("${reports.output.dir}")
    private String reportsDir;

    @Autowired
    private NotificationClient notificationClient;

    @Transactional
    public OrderResult processOrder(OrderRequest request) {
        Customer customer = entityManager.createQuery(
            "SELECT c FROM Customer c WHERE c.id = :id", Customer.class)
            .setParameter("id", request.getCustomerId())
            .getSingleResult();

        PricingRules rules = (PricingRules) redisTemplate
            .opsForHash().get("pricing:" + customer.getTier(), "rules");

        if (rules == null) {
            rules = restTemplate.getForObject(
                pricingApiUrl + "/rules?tier=" + customer.getTier(),
                PricingRules.class);
            redisTemplate.opsForHash().put("pricing:" + customer.getTier(),
                "rules", rules);
        }

        BigDecimal discount = BigDecimal.ZERO;
        if (customer.getTier().equals("PLATINUM")) {
            discount = request.getTotal().multiply(new BigDecimal("0.15"));
        } else if (customer.getTier().equals("GOLD")) {
            discount = request.getTotal().multiply(new BigDecimal("0.10"));
        } else if (customer.getTier().equals("SILVER")) {
            discount = request.getTotal().multiply(new BigDecimal("0.05"));
        }

        if (request.getCouponCode() != null) {
            discount = discount.add(new BigDecimal("20.00"));
        }

        BigDecimal adjusted = rules.getMarkup().add(request.getTotal());
        BigDecimal finalAmount = adjusted.subtract(discount);

        Order order = new Order();
        order.setCustomerId(customer.getId());
        order.setOriginalTotal(request.getTotal());
        order.setDiscount(discount);
        order.setFinalAmount(finalAmount);
        order.setStatus("PROCESSED");
        entityManager.persist(order);

        try {
            String reportPath = reportsDir + "/order_" + order.getId() + ".csv";
            Files.write(Paths.get(reportPath),
                ("total,discount,final\n" +
                 request.getTotal() + "," + discount + "," + finalAmount)
                    .getBytes());
        } catch (IOException e) {
            // swallowed
        }

        if (finalAmount.compareTo(new BigDecimal("1000")) > 0) {
            notificationClient.sendWebhook(
                "https://hooks.slack.com/services/T0EXAMPLE/B0EXAMPLE/000000000000",
                "Large order: " + order.getId());
        }

        return new OrderResult(order.getId(), finalAmount, discount);
    }
}

This one method touches:

the database
the cache
an external API
the filesystem
a downstream notification client
business logic
configuration

That is not one problem. That is several problems collapsed into one method.

If an agent changes this file without tests, it is making changes against invisible side effects. That is where trust collapses.

The method that actually works

Here is the sequence that works better.

1. Map the dependency graph first

Before the agent writes code, make it map the surface area:

Analyze OrderProcessingService.java. For each method, list:
- external dependencies
- internal calls
- configuration usage
- likely seam points for extraction
Output as a markdown table.

The goal is not perfect architecture analysis. The goal is to identify where the code touches the outside world and where pure logic might be extracted safely.

This step is still human-led. The agent can trace mechanics. The engineer decides which seams matter.

2. Write characterization tests before refactoring

Most teams try to jump straight to unit tests. That usually fails because the code was never designed for clean unit boundaries.

What works better is characterization testing: capture current behavior first, even the ugly parts.

@SpringBootTest
@Transactional
class OrderProcessingServiceCharacterizationTest {

    @Autowired OrderProcessingService service;
    @Autowired EntityManager entityManager;

    @Test void goldCustomer_noCoupon_getsTenPercent() {
        Customer customer = new Customer();
        customer.setId(1L);
        customer.setTier("GOLD");
        entityManager.persist(customer);

        OrderRequest request = new OrderRequest(1L, new BigDecimal("100.00"), null);

        OrderResult result = service.processOrder(request);

        assertThat(result.getDiscount()).isEqualByComparingTo("10.00");
    }

    @Test void couponOnSmallOrder_discountExceedsTotal() {
        Customer customer = new Customer();
        customer.setId(5L);
        customer.setTier("SILVER");
        entityManager.persist(customer);

        OrderRequest request = new OrderRequest(5L, new BigDecimal("10.00"), "SAVE20");

        OrderResult result = service.processOrder(request);

        // Current buggy behavior, pinned intentionally
        assertThat(result.getDiscount()).isEqualByComparingTo("20.50");
    }
}

The point is not to bless the bugs. The point is to make current behavior visible so the agent stops changing behavior by accident.

This is where coding agents help a lot. Once you define the scope clearly, they are effective at generating repetitive characterization coverage faster than most humans want to do it by hand.

3. Refactor one seam at a time

Once behavior is pinned, constrain the agent to one change at a time.

Extract the discount calculation logic from processOrder() into a package-private
method called calculateDiscount(String tier, BigDecimal total, String couponCode).
Do not change behavior. Run the characterization tests after the change.
If any test fails, stop and explain why.

That produces a smaller, verifiable step:

BigDecimal calculateDiscount(String tier, BigDecimal total, String couponCode) {
    BigDecimal discount = BigDecimal.ZERO;

    if (tier.equals("PLATINUM")) {
        discount = total.multiply(new BigDecimal("0.15"));
    } else if (tier.equals("GOLD")) {
        discount = total.multiply(new BigDecimal("0.10"));
    } else if (tier.equals("SILVER")) {
        discount = total.multiply(new BigDecimal("0.05"));
    }

    if (couponCode != null) {
        discount = discount.add(new BigDecimal("20.00"));
    }

    return discount;
}

Then move to the next seam. Not ten refactorings in one prompt. One seam, one verification cycle.

4. Extract interfaces around infrastructure

After the logic starts separating, extract the external concerns:

pricing lookup
report writing
notification sending
repository access

That turns a monolith of mixed concerns into testable components:

public interface PricingService {
    PricingRules getRules(String tier);
}

public interface ReportWriter {
    void writeOrderReport(Order order);
}

Now the main service becomes smaller, clearer, and more mockable. More importantly, the agent now has cleaner boundaries for future changes.

5. Only then let the agent modernize

Once the safety net exists and the seams are exposed, the agent becomes useful for the higher-leverage cleanup:

replace brittle conditionals
fix intentionally identified bugs
add proper error handling
isolate side effects
improve naming and structure

At that point the agent is no longer guessing inside a black box. It is operating inside a checked workflow.

What this looks like in practice

This is not a one-prompt miracle. It is a disciplined sequence:

Phase 1 - Map the surface area

identify dependencies
find the seams
separate pure logic from infrastructure-heavy methods

Phase 2 - Pin current behavior

write characterization tests
include current bugs if they are part of real behavior
make unintended change visible

Phase 3 - Extract and simplify

pull pure logic into testable units
isolate external concerns behind interfaces
reduce the blast radius of each change

Phase 4 - Modernize safely

replace brittle structures
fix known bugs intentionally
improve error handling and boundaries

The exact timeline will vary by codebase. That is not the real point. The real point is that once the code is under test, the agent stops guessing and starts working inside a system that can reject bad changes quickly.

The uncomfortable truth about coding agents

Coding agents are not magic refactoring engines.

They are extremely fast implementation engines that still need:

constraints
verification
small change boundaries
human review

The teams getting real value from coding agents on legacy code are not the ones using the fanciest prompts. They are the ones applying old, boring, correct engineering discipline - seams, tests, small verified changes - and then letting the agent accelerate the mechanical work.

Michael Feathers' playbook still holds. What changed is that agents make the execution faster once the safety net exists.

If your coding agent cannot refactor your legacy code, do not blame the model first. Ask whether the code has a safety net.

At CoEdify, we use coding agents the same way we use any serious engineering tool: inside a checked workflow. On legacy systems, the difference between a useful agent and a dangerous one is almost always the quality of the safety net around it. [coedify.com]