
Note: We have no affiliation with Anthropic, Cursor, or any other AI vendor, and we received no credits, discounts, sponsorships or special deals. Costs reported here reflect the publicly available Cursor API pricing at the time we did the work.
We’re a small, two-person startup. I’m a developer with 10+ years of experience, plus another 10 in technical consulting where I wrote a fair amount of code. My cofounder is the non-technical domain expert. Our SaaS product is in the “field services” category: we help customers in our space generate quotes and invoices, manage tasks and projects, schedule work and track time. This is the story of how Claude Opus 4.5 unblocked our growth by tackling a massive amount of foundational tech debt that spanned the entire codebase + the database. With Opus doing the heavy lifting, I finished the project:
- within a span of just two and a half weeks, instead of four to six months
- at a small fraction of what it would otherwise have cost us to outsource
- with zero burnout
The justification for the estimated cost and time savings is at the very end.
The condensed project timeline made it feel like I was inside a tornado. I wrote almost none of the code, but reviewed literally everything and made all the major decisions. Below, I’ll do my best to provide all relevant details that will allow the audience appreciate the monumental and complex tasks involved and how Opus crushed them, while avoiding the nitty gritty that might bore or confuse readers.
- Background
- Vendor Quotes
- Opus to the Rescue
- Part A: Creating Public Schema Tables and Objects
- Part B: Data Migration
- Part C: Code Refactoring
- Summary
- Conclusion
- Is the Title Actually Justified?
Background
I feel like every AI success story is met with cynical criticisms about lacking important context. So before even talking about the project itself, let me start by giving an account of how we got here.
When my cofounder and I started our company eight years ago, our target audience was medium to large companies. His employer was one such company, and the largest one of its kind in our state. He had been in the weeds for more than a decade and knew the company’s pain points inside out. When I casually mentioned my product idea to him, he got excited and suggested that his employer could actually be our first customer, so we teamed up.
I spent a few weeks building a proof of concept for creating quotes with line items coming from a shared product catalog, which is a core feature in most field service software. When we sat down with the CTO and showed it to him, he was instantly hooked. It was a huge improvement to the complex, localized Excel sheets they were using (which my cofounder had explained to me in detail). He started asking technical questions. He didn’t care about the tech stack, for which I’m glad because Elixir and Phoenix were relatively unknown back then. But one of his requirements was data isolation between their field offices. Apparently, they could not have data from their office in City A live in the same tables as data from their office in City B.
Our database of choice was Postgres and our infrastructure provider back then had a “one app, one database” limitation. I really didn’t want to manage each account with its own app instance and database combo, and I didn’t know enough about the stack yet to host everything myself on Digital Ocean or similar platform. After some investigation into various multi-tenant architectures, I settled on a middle ground: one database, but with a separate Postgres schema per tenant – a fairly common data topology in B2B SaaS. The CTO was satisfied and they signed up with us.
For the next seven years, having one schema per tenant worked well. Every function with a database operation in its call stack accepted a string parameter named tenant. This was used to generate a schema prefix to be used in the database queries. When a new tenant signed up, the server created a schema for them, then ran the tenant migrations under that schema to create all the tables and other objects. Similarly, when we added a new feature that had database migrations, those migrations ran for each tenant schema. With just several customer accounts, this took seconds.
But we wanted to grow the company. The problem is, selling software to large companies is time-consuming. We had known that at the start, but over the years our careers had evolved and we no longer had the kind of free time or flexibility that we originally had.
So earlier this year, we decided to shift our focus to small/medium sized businesses. We hired a marketing firm specializing in startups like us. Under their guidance, we made a large number of changes to the app over the course of several months to make it “self-serve”:
- Shortened and simplified onboarding by picking smart defaults
- Populated core tables with high quality sample data based on customer’s industry
- Overhauled and modernized the UI and simplified many UX flows
- Recorded videos, wrote documentation and linked them in every relevant page
- Completely redesigned our marketing pages to speak to the pain points of smaller companies, rather than large firms.
When September rolled around, we started a Google ads campaign to test our theory. We weren’t sure whether it would work or not. Truth be told, I’ve always been somewhat skeptical of paid marketing: people have told me that it is expensive and doesn’t work. A few have even claimed that it’s a total scam and that SEO is the way to go. So when I woke up the next morning and found ten emails in my inbox with subjects like “New Signup: <Company Name>”, I was stunned. The emails were from a simple Zapier automation I had configured back in the day that listened for changes in our Stripe customer list, then used our Sendgrid account to send me an email when it detected a new entry.
The next day, we got another dozen. And it went on like that for 6 weeks. By mid-October, we had gotten over 500 signups and had spent just $6,000. We were ecstatic because the volume was great and ~$12 per signup was way below our threshold. It was clear we had a viable path forward after changing our target customer profile.
But then something happened that gave me pause.
While the ad campaign was active, I was working on a new feature that included four new database tables and related objects. When I rolled it out that month and ran the tenant migrations, they took a whopping 10 minutes to finish! Considering the new tenants that we acquired via the ad campaign, the reason was obvious. Before, the migrate_all_tenants function, which ran automatically at the end of each deployment, created the necessary database objects for just the few large tenants we had. Now, for 500+ tenants, it had created more than 4,500 objects (4 tables plus indexes, FKs and constraints for each tenant), which took a while. During that time, tenants that had pending migrations kept getting runtime errors, as the deployed code ran queries on tables that did not yet exist in their schema. Not good.
Out of curiosity, I checked the tables view in Postgres’s built-in information_schema catalog, and found that the database had a little over 77,000 tables. I did the math: 500+ tenants, 150 tables each… yep. Other pg_catalog views similarly had thousands or tens of thousands of objects. I quickly realized that our one-schema-per-tenant topology was:
- not suitable for the type of growth we had gotten a taste of, because it would result in new features taking longer and longer to deploy as we got more customers,
- total overkill for our new target customer persona, because small companies in our market neither need nor care about things like schema-based data isolation.
I briefly considered trying to optimize our way out of it. Some companies successfully run tens of thousands of Postgres schemas across dozens of databases by sharding and continuously rebalancing, i.e. moving schemas between shards to keep the load even. But when I ran the numbers, it was a losing battle for a single-developer startup. If we hit our target of 3,000 customers by year’s end, even a well-optimized migration pipeline could still mean an hour of deploy time for any feature that required a database change. At 10,000 customers, it would be effectively unusable. The underlying issue was that we were shifting from an enterprise model (few tenants, high isolation) to a volume model (many tenants, rapid iteration), and our architecture needed to reflect that. Sticking with schema-per-tenant would have guaranteed that deployment velocity slowed down permanently as we grew. I found a Hacker News thread from 2020 where a lot of people were saying they had that exact problem, which helped me make up my mind.
I talked to my cofounder and we begrudgingly turned off the ad campaign. I didn’t want the problem getting worse while we considered our options. Over the years there had been times when I idly wondered whether I should migrate the app to a more standard topology, like with shared tables and row-level isolation. Each time, I had discarded the idea as an unnecessary distraction; we had new features that needed to be built, and we didn’t have time for that kind of open-heart surgery. Now though, it had become a necessity.
The problem? The codebase is more than eight years old and fairly large at this point, at least for a startup with a single developer! There are more than 200k lines of Elixir code in the backend, as well as a comprehensive test suite (55k lines) providing ~80% test coverage. There’s a decent amount of data to migrate as well. It was clear that we were looking at a major project regardless of how we sliced it.
I spent a weekend taking stock of what a migration and refactor would even entail. My research turned up three options:
- Migrate code and its data to the public schema one domain at a time. I didn’t like this option for a few reasons. For example, foreign-key relationships between tables would break if some records were migrated to the public schema and others remained in tenant schemas. We might be able to handle query joins manually during the transition, but we would also need to handle a dozen other things like cascading deletes manually, which introduced serious data integrity risk.
- Use tenant-based feature flags and migrate customers to the new topology/architecture in batches. This would require a separate set of modules that worked with the public schema. We would migrate a customer’s data, then turn on the feature flag for them to reroute the logic to the newer modules, which would read from and write to the public schema tables. This was horrifying to even think about, as it would be operationally complex and drastically increase the size of an already large codebase.
- Do it in one go, testing and validating along the way as much as possible. We could use synthetic data as well as recent backups of staging and prod databases for more realistic data. This came with its own challenge, which was the need to maintain a long-lived feature branch and keep it on parity with any changes that happen on the main branch.
After mulling it over, I settled on the third option. A “big bang” approach like that would certainly be risky, but it would in most ways be the simplest. And I figured most of the risks could be mitigated with comprehensive testing and validation before rollout, combined with a robust backup and rollback plan in case things went sideways.
This brought me to the next problem: the task itself was simply monumental. It entailed setting up 150 new tables and their objects inside the public schema with perfect accuracy and full parity, writing the data migrations and validations with no errors or omissions, and then refactoring almost every module in the web and data layers along with their tests. Every single part of the codebase would need to be carefully reviewed, updated and tested. On top of that, since I have a day job, I could only work on this on evenings and weekends. No matter how I sliced it, I knew I was looking at months of effort, with high risk of stalling or getting burned out.
Vendor Quotes
Dismayed, I explained this to my cofounder. He didn’t like the idea of a months-long refactor that would pin us down, so he suggested contacting a few firms with experience in our tech stack. Even if we didn’t accept their proposals, it would give us a sense of how big the scope really is in terms of implementation time. Frankly, I could use the sanity check, because there was the possibility that I was massively overcomplicating it, and perhaps there was a much easier way.
Elixir is not a widely used language, but we were able to get in touch with three vendors, one from Brazil and two from the US. After each requirements gathering call, where I gave the vendor team a detailed overview of what is needed, we asked for quotes that included the following deliverables:
- Replication of tenant tables and other objects under the public schema
- Migration of data from tenant tables to the public schema
- Validation of full data parity and integrity for all 500+ tenants
- Refactoring:
- Ecto schema files (169 files)
- Context modules containing all the database operations and associated business logic (39 files)
- Helper/util modules (46 files)
- Controllers (156 files)
- Background jobs (29 files)
- JSON views and HTML templates
- Auth and signup flow
- Test suite (180 files containing 1,763 test cases)
- Deployment and testing on Staging to validate
- Knowledge transfer / onboarding
- Project management (we requested weekly meetings)
While we waited for the quotes, I also gave a summary of the project requirements to three AI agents: Sonnet 4.5, GPT 5.1 and Gemini 2.5 (Opus 4.5 had not come out yet). I just wanted to know what I should expect, at least at a high level. Each agent performed a comprehensive analysis of the codebase and replied with an hourly breakdown of each item, as well as estimated cost. The details were somewhat different, but all three agents agreed on the big picture: that this would be a 1,000-1,500 hour project, would cost six figures and take 4-6 months.
Amazingly, when we received the quotes the following week, two of the three fit this range perfectly:
- Vendor A quoted us $120,000 and 16-18 weeks
- Vendor B quoted us $157,500 and 5 months
- Vendor C quoted us $75,000 and 3 months
Vendor C’s quote missed some items from the scope. It also included some odd wording that told us they didn’t fully understand the project or had any relevant experience. Vendor B’s quote was decent, but we did some sleuthing and found some not-so-good reviews about them. So we eliminated both.
We were left with Vendor A’s quote. As a former consultant, I thought the hourly breakdowns were the most realistic and the “Risks” section was the most thorough. There was one major issue though: we are a small startup and the quoted price is almost equal to our annual revenue! Yes, outsourcing the project would free up my time, at least in theory. In reality though, that freedom would be limited, as any bug fixes or greenfield features that included database migrations would be off the table since they would affect the migration project’s scope.
It was a frustrating situation. I had made a big decision about data topology when we were first starting out eight years ago. It was arguably fine and perhaps even necessary at the time, as that first customer ended up with a hundred users, which helped us pay the bills and encouraged us to keep developing the product. Now though, the pivot in our business strategy had caused the original decision to turn into a substantial piece of tech debt that was blocking our growth. Not only that, it would cost a ton of money and time to undo.
Or would it?
Opus to the Rescue
On the same week that we rejected all the vendor proposals, Anthropic released Claude Opus 4.5, and it felt like a bit of a watershed moment. Whereas with previous model releases, there would be a mix of positive and negative first impressions on the Internet, with this one virtually everyone in my circle seemed to be raving about it. I think there were four submissions on the front page of Hacker News at one point, and people on Twitter/X were going nuts too. When I noticed this, I texted one of my friends – a Staff Software Engineer at a large startup – to ask him what he thought of it. He replied “for the first time since AI came out, I’m actually a bit worried, because this thing did literally all of my coding work for me today, and none of it was trivial.”
So, that Friday night, I decided to take it for a spin.
Day 1
For the past year, I’ve been using AI heavily every day for all kinds of work-related tasks: bug fixes, new features, refactoring, writing tests, writing documentation and help articles, you name it. However, trying to attempt a six-figure project with it seemed insane even to me. It wasn’t a low-risk greenfield feature either: if we messed it up, it could result in database corruption, data leakage, security vulnerabilities… it was simply too risky to involve AI.
So what I set out to do was to use it as a sounding board and see if it had any interesting insights. I figured a fourth opinion wouldn’t hurt, especially when it takes just a few minutes and costs almost nothing.
I started by describing the product’s features at a high level, as well as the one-schema-per-tenant data topology, our reason for choosing it in the beginning, and why it no longer fit our purpose. I also explained in broad strokes what we were looking to achieve with the migration. In order to avoid introducing bias, I refrained from giving it the vendor quotes. I’ve learned that being careful about “polluting” an AI’s context is crucial for getting good results. I submitted my prompt into Cursor’s chat box.
Opus explored the codebase, both backend and frontend. It then did some web searches and ran queries in the dev database to validate its understanding. After a few minutes, it responded with:
- detailed breakdown of the current data topology and how it shaped the app’s architecture
- the full scope of the requirements to migrate everything to the public schema
- a list of justifications for why a migration made sense in light of our new business strategy
- a comprehensive list of risks and how to mitigate each one using relevant defense-in-depth strategies
- a safe, step-by-step rollout strategy when the time came
This response was significantly more thorough than that of the AI agents I had previously consulted. It included not just code level concerns, but also advantages and disadvantages related to infrastructure, developer experience, scalability, security and business operations. It was at the level of design documents that my senior/principal developer coworkers produce at my day job. So I was intrigued.
But wait. Was it agreeing with the need for a migration because of Claude models’ notorious tendency towards sycophancy? After using various Sonnet versions for a full year, I have become well aware of the “you are absolutely right!” problem. To counter it, whenever I get the impression that the agent is being a little too agreeable and affirming, I switch to adversarial questioning where I try to poke holes in its reasoning. Regardless, despite my skeptical questioning and pushbacks, Opus held firm and validated our plan. It even highlighted several parts of the codebase that would drastically benefit from a migration. For example, it pointed out that we could really simplify webhook processing, as there would no longer be a need to use extra lookup tables in the public schema and related code for mapping webhooks to tenants. We could also run a lot more tests in parallel because creating tenants in the test suite would no longer require running tenant migrations. I was once again surprised and impressed, as neither of these had occurred to me, the three vendors, or the other AI agents.
Day 2
Boosted by a surge of courage and motivation, I woke up early the next morning, grabbed a cup of coffee, and got to work. I ended up spending most of the day creating three documents with Opus:
- Part A: Strategy for creating the new tables and other objects inside the public schema
- Part B: Strategy for migrating and validating the data
- Part C: Strategy for refactoring the codebase and test suite
The documents were comprehensive, each about 2,000-2,500 lines long, written for an audience of other AI agents (which would also be Opus 4.5s) that would be doing the grunt work. Opus wrote these documents in chunks. I read each chunk several times, asked questions about it, made small corrections to it and ultimately approved it. My experience has taught me that this is a much better approach than asking it to write one massive document because you have a chance to ask questions and make corrections at regular intervals to prevent it from going down too deep the wrong rabbit hole. The number of mistakes and omissions Opus made were very small, and mostly minor.
Now, this article is about Opus 4.5, but I want to come clean here and admit that I cheated a bit here. Once all three documents were complete, I gave them to Gemini 3 (which had been released a week before Opus 4.5) and GPT 5.1. I asked each model for their comprehensive adversarial review and input. After all, every model has its strengths and weaknesses. In my prompt, I specifically said, “be ruthless in your feedback and critique.” For a project as crucial as this one, I wanted to cover all my bases.
Similar to Opus 4.5, both models explored the codebase and ran database queries and local functions in Cursor using available MCPs. In the end, Gemini 3 said the documents were “unusually thorough” and did not find any major issues, but suggested adding some extra indices and unique constraints to the new tables. I decided to handle those post-migration once the dust settled. GPT 5.1 also remarked on the comprehensiveness and quality of the plans and said they describe a “textbook migration project”. It suggested a small revision to one of the documents, but after some back and forth, it realized that it had misunderstood the relevant requirement and said the original approach was actually sound.
Part A: Creating Public Schema Tables and Objects
Just a day and a half prior, after we turned down the vendor quotes, I had been convinced the project would take up months of my time. It looked certain that the stress and tediousness would eat away at my energy and motivation. This was disheartening particularly because we had tasted a measure of success with our ad campaign. That evening, though, after all SOTA models put their stamps of approval on the plan docs Opus 4.5 wrote, I was feeling pretty confident, perhaps even eager. Exhausted too. Carefully reading thousands of lines of specs and iterating on them to make sure they were accurate is mentally draining. So I decided to take the rest of the evening off.
Day 3
The next morning, I fired up Cursor again, created a new git branch, and put Opus to work.
For creating the public schema tables and objects, the original Opus with which I had authored the docs had documented two crucial insights.
First, the new tables would need to be created with perfect accuracy and correctness. In other words, they needed to mirror the current state of the corresponding tenant tables. The approach Opus suggested was elegant in its simplicity: to query Postgres’s information_schema for all objects inside the reference tenant schema and write the new migrations based on the results. The idea was to fetch a list of tables from one of the tenants (since they all had identical tables), then for each table fetch that table’s columns, indexes and so on. Then the findings would be used to generate new migration files, one for each table and its objects, along with supplementary columns (e.g. tenant_id). Any schema-level objects, such as functions, would also get their own migration file.
Second, it had realized that there were complex dependencies between the tables, including some that had self-referential FKs. This was important because I had instructed it to create one migration file per table and its objects. Therefore, migrations would need to be created in a specific order. For example, if an invoice record has a user_id foreign key, that means the users table would need to be created first so the user_id field could reference it. Opus also understood that the data migration functions would need to run in a specific order as well, and noted that in the refactor document. Among the 150 tables we were looking to migrate, it identified 9 total layers of dependencies, starting with “Level 0” tables with no FKs and ending with complex “Level 8” join tables for records that themselves were associated with multiple other records.
One of the requirements was that the PK scheme would change from bigint to UUID. This was going to be a bit tricky, because we were using integer PKs as user-friendly display numbers in URLs and UIs. So if a user navigated to /acme_inc/invoices/154, the system did a lookup inside the “tenant_acme_inc” schema for an invoice with that ID. I’m of course simplifying (there’s authentication and other lookups), but overall it was pretty straightforward and worked okay enough. In the new system, PKs are UUIDs, which don’t fit that purpose at all. In addition, we needed to maintain backwards compatibility, because there were a lot of existing URLs out there (email notifications, bookmarks, PDFs, etc.) and they needed to continue working.
So Opus came up with the idea of adding a display_number column to each record that would hold the current ID of that record in the corresponding tenant schema table. In this scheme, an invoice with id 154 inside the acme_inc schema would become invoice with ID f81d4fae-7dec-11d0-a765-00a0c91e6bf6 and display_number 154 and tenant_id 5 (for Acme). Not only that, but Opus also correctly realized something about these display numbers: it should be possible for both Acme Inc and Bob’s Flowers to have an invoice with display number 154, and when Acme creates invoice 155, the invoice counter for Bob’s Flowers should not increment. In the one-schema-per-tenant topology, this was trivial: each tenant had its own schema, each schema had its own tables, and each table had its own PK sequence. In the new topology, it needed to be a bit more hands on.
Opus also realized that we needed tenant-scoped numbering (no collisions across tenants) and display numbers needed to be gapless, so we avoided Postgres sequences and used a transactional counter table instead. We could have created a sequence for each tenant and record that needed a display number, but that felt like an anti-pattern. After all, the entire goal of this migration was to avoid having to create (and maintain) new database objects per tenant. So Opus settled on using a tenant_sequences table combined with a trigger function tied to record insertions for tables that need display numbers. The trigger function takes the tenant_id and record_type as inputs, increments the `current_value` column for that row, then assigns the new value as the display_number of the created record. This is both transactional and rollbackable. It doesn’t exactly have stellar performance, but it’s more than sufficient because display numbers are assigned only once (at record creation) and only on certain record types. Just to be safe, I made sure that all display number columns were defined as NOT NULL and there were tenant_id, display_number unique constraints on each relevant record.
Still, I was a bit paranoid. My biggest concern was that the new migrations would omit something crucial, like an important index for a table. If that happened, it would almost certainly result in performance issues (since tables would be shared now, meaning many more records per table) and users would complain. Or what if it omitted a unique constraint, or even entire columns?
So I started a separate chat session, also with Opus 4.5, which I will refer to as Opus B. I gave it the Part A document, explained my concern, as well as what Opus A had done so far. It understood what was needed right away, and wrote a migration validator function that would:
- Run the migrations Opus A had written to create the new tables in public schema
- Iterate over all the tables inside the tenant schema we were using as a reference
- Cross-reference each tenant table against the new public schema tables to make sure:
- All columns were carried over with correct types (except PKs and FKs, which became UUIDs)
- All existing FK relationships were preserved
- All existing indexes and constraints carried over
- New columns were added (
tenant_idon every table, anddisplay_numberfor tables that need it) - New composite indexes and unique constraints were added for the new
tenant_idanddisplay_numbercolumns. tenant_sequenceshad the correctcurrent_valuevalues for each record type and tenant (i.e. if Acme had reached invoice ID 155 in their tenant schema, their post-migration invoice counter would start at 156)
- Roll back the migrations to revert the database to its original state
This validator caught two minor omissions: the Opus A had indeed missed one index in one of the tables, and had misnamed a unique constraint. Upon investigation, we found that the omitted index was actually not utilized by any of our queries – it was completely unnecessary. A misnamed unique constraint was not a huge issue either: on the off-chance it went live, it would simply result in a query error if triggered, and Ecto wouldn’t be able to convert that to a user-friendly message. The constraint itself, though, would still work. Overall, these mistakes were not bad at all, considering that over a thousand objects were being migrated to the public schema!
Part B: Data Migration
With the tables in place and confident in their parity with their tenant schema counterparts, I decided to move on to the next stage: writing the functions to migrate the data from the tenant tables to the new tables in the public schema.
It was 7 PM on Sunday. The chat session with Opus A had been going on since that morning, and it was starting to get a bit confused. Over the past year of using AI almost every day, I’ve learned to recognize the symptoms: there’s always a point where I start having to repeat myself or issue clarifications or corrections to statements it makes, and it gets worse as time goes on. Cursor also slows down a lot, and can actually become unstable. So I started a new chat session and gave it the three markdown documents, along with the folder location of the new migration files.
Based on the docs, Opus C understood what needed to be done next. So, over the next hour, it wrote stubs for the data migration functions for each level. Once filled in, these would:
- Migrate data from each tenant table to the corresponding new public schema table and assigned the appropriate
tenant_idto every migrated record - For new tables that now have
display_numbercolumn, transfer the integer ID of the original record to that column - For tenant tables that previously had custom-generated UUIDs, make them the PK IDs of the newly migrated records
- For tenant tables that had self-referential FKs, it would need to ensure the parent records are created first, then their child records
Day 4
The next evening after I got off work, implementation began. After Opus got done with implementing each data migration in a given level (which, as mentioned previously, were determined by dependencies between entities), I ran the migration pipeline from the start up to that point. If any errors came up, I gave them to Opus C to fix, nudging and providing guidance as necessary.
These errors were due to the following:
- Some of the older columns named “uuid” were actually varchar type, and I had stored short UUIDs in them instead of regular UUIDs. In retrospect, that was a bad decision: the column type should have been uuid and stored a real UUID as well, and the encoding to/decoding from short UUIDs should have been done on runtime. Since these column names still said “uuid”, Opus C understandably made the assumption they contained real UUIDs. When the migration function ran, Postgres rejected an incoming value as invalid uuid. The fix was easy: change the code in each migration function to decode those short UUIDs back to regular UUIDs before storing the values in the new PK columns.
- For two of the tables, Opus A had not gotten the dependency levels quite right. When the data migration failed, Opus C realized this and shifted them around. When one of its fixes didn’t work, it performed another analysis and this time realized that the order of the migrations even inside a given level could matter, and adjusted the order of the function calls accordingly, which resolved the issue.
- In one table, it turned out we had corrupt data. Opus C investigated and determined the cause to be a rare race condition in app code that resulted in duplicate records getting created. Due to the way our queries were written, this had never surfaced itself during runtime, but the duplicates tripped up one of the new unique constraints in the public schema table. Fortunately, the corruption was rare enough that the duplicates could be removed by hand from each database (which we did in our local databases first to validate).
It also flagged one issue it came across when migrating one of the tables. The table had an integer array column named user_ids. Opus C suspected that this contained references to users, and was able to quickly confirm its suspicion. The problem? Users don’t have display numbers, so trying to find users by integer ids in the new tables would result in errors, since the new IDs are UUIDs. After finding two other tables with similar columns, Opus C suggested writing a helper function that would automatically convert the values during the migration (since those records had already been migrated in earlier levels).
At the end of Thursday evening, data migration pipeline was finished and ran start to end without errors. That of course wasn’t enough. The migrated records would need to be validated thoroughly.
Day 8
So on Friday evening after work, I started another chat session and had Opus D write a data validation function that did the following:
- Validation 1: For each tenant, iterate through each table in its schema, then run
`SELECT count(id) from <table> where tenant_id = $1and make sure the corresponding public schema table has the same number of records for that tenant. - Validation 2: Make sure integer PKs from tenant tables got correctly transferred to the
display_numbercolumns for the new tables that have them. - Validation 3: Make sure any custom uuids in the old tables became the PKs of the new tables.
- Validation 4: Check foreign keys to ensure their integrity. An invoice that belonged to Customer 168 for tenant Acme Inc should continue to belong to that tenant and customer post-migration.
- Validation 5: For the first and last 100 migrated records in each table, compare them to their pre-migration counterparts, column by column, and ensure there is full parity in values. This was meant as more of a sanity check than anything else.
This function failed Validation 1 for one of the Level 8 join tables, which I told Opus C about. After investigating, it found that the join tables in the tenant tables actually did not have the correct ON DELETE setting, causing them to become orphaned when associated records got deleted. It recommended deleting them before the migration, as join records with null FKs are essentially stale and useless. It also double-checked the new join table in the public schema and confirmed it had the correct ON DELETE. Aside from this, there were no issues found by the validation function.
Part C: Code Refactoring
Day 9
The next morning, with an entire weekend ahead of me, I was ready to start Part C. As mentioned previously, the codebase is fairly large, at least for a one-developer startup: ~200k lines of code + 55k tests on the server covering about 18 major domains, like invoices, quotes, file uploads, customers, accounts, and so on. I knew that this part of the project would take the longest, especially since we would also need to refactor the test suite for the new APIs.
My hunch turned out to be correct. Overall, the refactoring took about ten days. Every day or evening, I had an Opus agent look at the next one or two domains to refactor. Each domain contained:
- One or more Ecto schema modules that describe the corresponding tables and their fields, as well as changeset functions
- One “context” module that had CRUD operations and complex business logic
- Any modules responsible for background jobs (we use Oban)
- Controller modules
- View modules for JSON
- HTML templates for static pages, emails and PDFs
- ExUnit Tests (which, despite to the name of the library, include all types of tests)
For these refactors, I made extensive use of Cursor’s ‘Plan Mode’, which to my understanding enhances the agent with extra tools and instructs it to create high-fidelity markdown documents. What makes Plan Mode really nice is that after researching what the task would entail, the agent can ask questions for clarification or major decisions (like “how do you want to handle such-and-such scenario?”). Each generated plan document contains a phase-by-phase approach to the task, with as much detail as the model will need, as well as a corresponding to-do list. Once you review it and are happy with the result, you click a button on the UI labeled “Build” and the agent happily takes over from there.
After refactoring each domain’s modules, the agent also wrote a markdown document containing a summary of changes. These summaries included the following:
- Description of starting state, e.g. “previous run refactored domains A and B, and previous agent identified domain C as the next logical step”
- List of modules refactored
- List of patterns used, along with examples (e.g. using
tenant_idto scope queries, removing custom UUID generation, etc.) - Any refactoring tasks that proved to be trickier than anticipated, the reasons for it, and how those challenges were overcome
- Any logic that had to be temporarily disabled due to associated domains pending refactor
These worked as a sort of running memory. The following day, I would give that summary to a new Opus chat, along with the migration docs for context, and it would do the same thing:
- Read the summary of what the previous agent did
- Write a plan in Plan Mode for the next domain(s), using the same patterns and idioms as the previous agent (which were documented in the summary)
- Implement that plan until all tests pass
- Write a new summary
The first of these Opuses – Opus E – came to the realization that, just like the migration itself, refactoring should also be done in a certain order in order to be able to do it piecemeal without compilation errors and to avoid runtime errors in the test suite. There were complex dependencies and interactions between domains and their modules. For example, we couldn’t refactor the Projects domain before we migrated Properties (job sites), because each Project requires (i.e. takes place at) a Property and so they need to be located in the same Postgres schema for that validation to work. Similarly, converting a quote to an invoice meant we had to temporarily comment out their dependency until both domains were refactored. Opus E determined that the dependency levels used in the migration should also be used for the refactoring, and overall that turned out to be sound decision.
We also took the opportunity to completely refactor the test suite by having Opus:
- Migrate all of our custom test fixtures to standard test factory methods. This itself was substantial undertaking, as there was a lot of associated tech debt accumulated over eight years
- Determine which tests were now safe to run in parallel, and which can be refactored easily to be safe to run in parallel
- Add new comprehensive tests to ensure complete tenant isolation when reading and writing data
- Add even more tests to ensure complete data isolation during authentication and related logic (e.g. 2FA, password reset, etc.)
Ensuring Proper Tenant Isolation
Previously, tenant isolation was not something we had to think hard about, since Postgres schemas provided “natural” isolation. Querying was a simple matter of taking a tenant string out of a request header, authenticating it, and using it as the schema name (with “tenant_acme” convention) when doing the queries. If we somehow forgot to include it, we would catch it quickly (since not including an explicit schema in the query makes Postgres run the query in the public schema, which doesn’t have that table, so instant runtime error). Now though we had to be super careful and make sure every query was scoped to tenant_id. Otherwise, tenant records would leak to all tenants.
First, I considered using Postgres Row-Level Security (RLS). On paper, it sounded like the perfect solution: push the security policy down to the database kernel so it’s impossible to forget. However, in my experience, RLS introduces quite a bit of friction when using connection pooling. Since the database user is shared across all requests, you have to verify that the tenant context is explicitly set (and unset!) on every single connection checkout. If that context-switching logic ever fails, you risk leaking data on a shared connection. I decided that managing this hidden state was operationally more complex and error-prone than simply making our application code explicit about what it was fetching.
So I settled on Ecto’s prepare_query callback, which gave us the best of both worlds: explicit queries with an automatic safety net. We implemented a hook that intercepts every read operation before it hits the database. It introspects the query’s target schema, and if that schema has a tenant_id field, it strictly enforces that a tenant_id filter is present in the query constraints. This means that if myself or an AI agent writes a “naked” query like Repo.all(Invoice), the application will raise a hard exception at runtime rather than silently returning all tenants’ data.
# Before
Repo.all(Invoice, prefix: build_prefix(tenant))
# After
Repo.all(Invoice, tenant_id: tenant_id)
With the callback we implemented, if Repo.all is called without tenant_id, an exception is raised.
Finally, to catch these issues even earlier, I instructed Opus to strengthen our test suite against cross-tenant leakage. We added a new category of tests that explicitly attempt to fetch “Tenant B” records while authenticated as “Tenant A,” ensuring our authorization logic holds up under attack. I also updated our internal testing_strategy.md file, which is a context document we feed to our AI agents, to mandate that every new domain or feature must include these negative assertions. This ensures that as the codebase grows, our AI agents uphold the security standards we’ve established.
Summary
Overall, this re-architecture and migration project was both arduous and tedious, even with Opus. We moved at an incredibly fast pace, but it still felt like it would never end. The days were long and exhausting. During weekdays, I would work for eight hours at my day job, then have a quick dinner and start working on the refactor and do that until bedtime. Over a whirlwind period of 19 days, I had Opus perform a “down to the studs” refactor of the entire server codebase. Our Vue app required changes as well to ensure it rendered display_number instead of IDs and made API requests using display numbers. The latter might change in the future where we make PUT/DELETE requests using UUID primary keys.
In the end, I compared the main branch with the migration branch. This is the same image I’ve included at the beginning.

The big disparity between the green and red is due to:
- 7k lines of documentation
- 13k lines of data migration and validation functions
- 3k lines of migration files containing public table and object definitions
- 4k lines of additional tests we wrote to triple-check tenant-scoping and refactored authentication logic (for defense-in-depth)
Definitely the biggest PR I’ve ever worked on!
There was only one situation where we got stuck in what I call the “AI quagmire”. During Part B, the Opus agent realized that the migration order was wrong, and had to change things around… which required more changes to work it had already completed. This definitely threw it off balance, probably because the context filled up right in the middle and some crucial details got lost in Cursor’s auto-summarization. I think I spent an hour trying to guide it back and make it fix all the errors that kept coming up. The moment it resolved the issue and was finished with that level, I started a brand new chat to leave the polluted context behind.
Conclusion
Thanks to Opus, what started out as frustration and dread was replaced by hope and determination, and I couldn’t be happier with the result. Not only were we able to save a ton of money, but we also got the project done in about two and a half weeks and got ourselves unblocked. We spent an extra week after that where we ran the migration pipeline against a copy of the prod database and did a lot of sanity checks manually just to be really sure everything would work well post-migration. My cofounder also validated his own employer’s workspaces in the test environment by checking them against their counterparts in prod. We did not find any issues during that week, except one API call I had forgotten to update in the Vue app. We deployed the branch to prod the following week.
Opus helped me go from “oh god, this project is going to really suck, how are we even going to do it?” and “do we even need to do it?” to “okay yes, this is definitely doable, we have a clear roadmap, and it will actually be great in the end because the system will become a lot more performant and scalable.”
I want to note a few important points though:
- I was not “vibe coding” at any point. I treated Opus as a pair programmer who is very knowledgeable but also new to this codebase. It did all the driving and I read every single line of code that it wanted to add, change or remove. When it did something overly complex or didn’t quite understand a task, I rejected the changes and instructed it on the approach I wanted.
- The overall process was “discuss, write a plan doc, then implement”. I have found that this greatly helps AI agents stay focused and noticeably reduces the number of mistakes they make, especially on longer tasks. It also creates natural cut-off points where a chat session can be ended and a fresh one can be started, which helps prevent context rot.
- I was paranoid about the possibility of an incomplete or flawed migration, and had Opus add multiple layers of validation and testing. I ran the full migration pipeline locally first, then on staging, then locally again on a (partial) copy of the prod database. On top of that, my cofounder and I performed post-migration sanity checks to validate data integrity and frontend/backend alignment.
- Even though Opus wrote 99% of the code, I was still exhausted. Based on the time frame and total changes, I reviewed at least 3,000 lines of code per day on average, plus all the plan docs that Opus wrote using Cursor’s Plan Mode. Some nights when I closed my eyes to go to sleep, I saw outlines of code. Pretty sure I had several vivid dreams about approving/rejecting code changes.
- Our pre-existing test coverage was a big advantage. It gave Opus clear direction on intended behavior, as well as potential gotchas. This was particularly important because it’s an eight year old codebase, and I didn’t fully remember the details of code from earlier years and all the localized tech debt in each domain.
- Ultimately, I’m the author of the entire codebase, and I know the product inside out. This meant I could make many decisions confidently and relatively quickly. My level of experience with Elixir also ensured that I was able to quickly identify when Opus didn’t write idiomatic Elixir code. This is a bigger issue with models like Sonnet, but even Opus occasionally did things like rely too much on if/else blocks when pattern-matching would have been cleaner.
Is the Title Actually Justified?
Going into this, I expected this migration to be a 4-6 month slog. Not because the individual steps were conceptually hard, but because it’s the kind of work that turns into a thousand really tedious and error-prone tasks, like tracking down and refactoring every function call and query, keeping tenant scoping airtight, writing migrators and validators, and then iterating until the tests pass. And you have to pay attention the whole time.
I also sanity-checked that estimate with the market. Two credible vendors quoted the same timeframe, at a range of $120K to $157.5K, and that didn’t include the internal time we’d still spend answering questions, reviewing, and testing, not to mention any project delays. In other words, my choices were:
- Outsource and pay six-figures, while still worrying about whether it was being done correctly
- Do it myself, get tied up for months while trying to not lose motivation or get sidetracked
- Don’t migrate, let the tech debt build up while optimizing as much as possible, and deal with increasing amounts of operational complexity
Instead, what happened was:
- We got from “this is scary and depressing” to “done and ready to ship” in ~2.5 weeks.
- Direct incremental spend on the model was about $1.2K. Under normal circumstances this would have been a bit more, but during the first two weeks Opus 4.5 came out, Cursor had it priced lower. Still, it was a comparatively small amount.
- I still had to drive: architect, review, ask questions, provide corrections and clarifications, run tests, and catch edge cases. But the model massively compressed the grindy parts, like enumerating every place tenant scoping could break, generating the repetitive migrations and validator scaffolding, refactoring tens of thousands of lines of code and tests, and keeping momentum when I would normally stall out.
Is it fair to say that Opus may have saved us $100K+?
If you compare against the baseline we actually had in hand, i.e. vendor quotes, I would say yes. The math is straightforward:
- Outsource baseline: $120K-$157.5K
- Incremental AI spend: ~$1.2K
- Net avoided cash outlay: ~$118.8K–$156.3K
Even if you discount that heavily (assume we could have negotiated down, assume some scope would have been dropped, assume we’d still burn internal time either way), the result is still in the high five-figures: not just the salary I would (theoretically) pay myself during the project, but also significant opportunity cost of no new customers + no significant new features for existing customers during the 4-6 month migration and refactor. And that’s before you account for the less visible costs, such as delaying new work for months and the risk of accidentally shipping a migration that resulted in data corruption or loss. Those are hard to price precisely, but anyone who has lived through a long-running data-model migration hell knows they’re real.
I want to be perfectly clear though: the model didn’t “replace engineering.” That’s not the claim I’m making at all. In fact, relying on the model to “just handle it” would have been disastrous. There were times where it happily marked tasks done and wanted to move on, and when I questioned it and had it validate the work, it found issues. Compared to other models I have used (including Gemini 3 and GPT 5.1 Codex), these situations were relatively rare with Opus 4.5, and really only happened after the sessions had been going on for a while, but it’s not something that could be left up to chance.
Instead, what I’m saying is that Opus amplified my effort by 8x by acting as a highly capable, tireless mid-level developer who needed some supervision. It allowed a senior engineer (me) to architect the solution and audit the safety mechanisms, while offloading the sheer volume of tedious execution that would have otherwise drowned me, and perhaps even a small team. The result wasn’t “automated engineering,” but rather “massively accelerated engineering”.
That’s why I’m comfortable saying it may very well have saved us $100K+. Not as a vibe, but as a direct comparison to what the market quoted us to accomplish the same outcome, and to what it would have cost us to do it ourselves.
Why didn’t we simply optimize the previous architecture?
I asked myself this several times, both before and during the project. But the fact of the matter was that the “ceiling” of the old architecture was too low for our new business needs. For example, I know that Postgres can handle a lot more tables than we had, and we could have increased max_locks_per_transaction or implemented a sharding strategy. But even with that, adding new database objects to thousands of customers (which we were on trajectory to hit) would have taken hours. One-schema-per-tenant is excellent for enterprise B2B with relatively few high-value customers. Our pivot to the SMB market meant our data topology was no longer aligned with our business model.
I hope this gives you a good idea of what is possible with AI. Specifically, I strongly believe that Opus 4.5 is on a category of its own and represents a watershed moment for our industry. In the right hands, and under the right conditions, it can make a tremendous amount of difference, especially for a resource-constraint startup. It does come with caveats, though, and should not be viewed as a shortcut.