Can LLMs Replace an Entire Software Company? A Reality Check

26 Aug 2025 - tsp
Last update 26 Aug 2025
Reading time 6 mins

Every few months, someone claims that large language models will soon make human programmers obsolete. A very prominent tech person recently suggested that it might be possible to replace all the people in a software company with AI, creating a sort of fully automated factory for code. On first glance this sounds provocative and very possible, but when one actually works with these systems on a daily basis, the impression quickly changes.

Yes, there is huge potential to remove certain categories of work. The repetitive and poorly paid entry-level jobs, the tasks that are essentially about connecting two well-documented APIs with a few lines of glue code (like most Python or API development jobs at the moment) will probably not survive. Likewise, debugging small snippets, writing test harnesses, documenting functions, or generating structured boilerplate code - all of that can be done today and often more quickly than by a junior developer. Models are good at producing isolated stubs, at explaining concepts, and at rewriting small functions in different idioms. They can even act as research assistants, summarising documentation and structuring text in ways that accelerate the work of those who know what they are doing.

But this is where the boundary already becomes visible. Once the scope of a project grows, once design goals deviate from mainstream patterns, or once long-term maintainability becomes a requirement, models begin to drift. They tend to reintroduce redundancy, miscorrect their own fixes, and loop over subtle design errors. They are good at producing something that runs, but much less good at producing something clean. As a result, the code often has to be rewritten once the real requirements are clear. This is not a matter of a missing feature or two, but of entire structures that no longer match the goals of the system.

Costs scale quickly. To give a concrete example: I recently rebuilt a Zotero MCP with extended functionality - full-text semantic search, alternative chunking methods, some orchestration logic. Using a modern orchestrator (Kilo) and a strong model (Claude Sonnet 4 by Anthropic), I managed to get a working version within two evenings. The price tag: around 150 EUR in token cost. That is impressive compared to a week of manual work, but the result was far messier than if I had written it directly myself. Fixing later bugs - because the codebase grows and the model has to be re-fed the relevant context again and again - costed easily 20-30 EUR per bug. Those are costs that will not shock any company. But it adds up fast. An employee that costs 2000 EUR a month more may be not so viable. And this was a rather small and compact project. In economic terms this means: for prototyping, brilliant; for production, questionable.

The dream of simply “running it all locally” to cut cost is equally misleading. For the small and mid-sized models (7–70 B parameters), local inference is already feasible, and for many tasks this is sufficient. But the very large contexts that make real software projects manageable - more than 200k tokens of context window, 300–500B parameters - would require terabytes of VRAM and RAM. Only hyperscalers with dedicated clusters can afford to keep such models loaded and available. For everyone else, the economics point to cloud usage, with all the cost structures and steering effort that implies.

The Future Outlook: Collaboration, Limits, and the Question of Replacement

If one looks beyond the hype and takes a sober perspective, the most plausible development is a growing symbiosis between human developers and LLM-based systems. The workflow of tomorrow is not a matter of pressing a button and waiting for an entire application to appear; it is more like conducting an orchestra of agents that generate, test, refactor, and document code under human supervision with human collaboration and feedback.

In such a setting, the human role shifts from typing every line to steering, curating, and integrating. Instead of writing the fifteenth variant of a logging utility, the engineer might specify design constraints, check whether the produced structure adheres to security and performance requirements, and decide which of several machine-generated alternatives fits the architecture best. The machine provides the bulk output; the human ensures coherence and direction. And then performs either fully manual or guided adjustments to the specific detailed paths.

There are natural limits to this arrangement. One limit is contextual coherence: models, even with large contexts, cannot yet maintain a consistent long-term memory of a project that spans millions of lines and years of development. They can write documentation for themselves (usually in form of Markdown files in the project directory). But they can only ever act on the slice of the project that is visible to them at inference time, which makes them prone to “forgetting” decisions made weeks or months earlier. A second limit is design deviation: if a project deliberately breaks with mainstream frameworks or prioritizes non-standard constraints (for example, running on exotic hardware, or adhering to unusual security policies), the models tend to resist, drifting back to common denominators.

Could there come a point where the asymmetry flips, where humans are the minor addition to an essentially autonomous development process? Most likely, but such a point would require two conditions that are not visible yet. First, the models would need persistent, reliable, and economically viable long-term memory across entire projects - not just retrieval, but true project state that remains consistent over years. Second, they would need to internalize non-mainstream design constraints as first-class citizens rather than temporary instructions that are easily ignored once the pattern deviates from training priors. Without these two, they remain powerful assistants, but assistants nonetheless.

It is conceivable that in the distant future such capabilities will arrive, at which point the role of humans may shift further toward product vision, ethics, and integration with the rest of society. But even then, humans are unlikely to become irrelevant. Software is not just code; it is also about aligning with human institutions, economic trade-offs and especially social expectations. These are domains where human judgment remains indispensable - mostly because of the expectations of other humans, not because they are not capable of.

So the future is not one of full automation, but of layered collaboration. Machines will dominate the repetitive and the mechanical. Humans will dominate the strategic and the responsible. And between the two lies a broad grey zone, where productivity can rise dramatically - but only when both sides are present.

Conclusion

So can LLMs replace everyone in a software company?

They can replace some, accelerate many, and support nearly all - but they cannot yet replace the central role of human judgment and planning. The companies that will thrive are not those who imagine firing all their engineers, but those who integrate AI into the workflow as a force multiplier. The winners will be those who accept fragility as a risk, who invest in keeping systems coherent, and who understand that speed without direction leads only to collapse. And of course those who recognize they still need junior developers who grow into the senior roles, even if they are not economical to employ when you could substitute them with LLMs during the first few years to decades of their productive period.

The future is not one of machine companies and redundant people. It is one of hybrid organizations, where humans and machines form an uneasy but powerful collaboration.

Can LLMs Replace an Entire Software Company? A Reality Check

The Future Outlook: Collaboration, Limits, and the Question of Replacement

Conclusion

Related articles

Coding with an AI Assistant: My Ongoing Journey into Vibe Coding

Why pure JavaScript web apps are neither the future nor a good idea

mini-apigw: A Lightweight Gateway for Multi-Model AI Infrastructure

Understanding GPTs and Large Language Models in Non-Technical Terms: What They Are and How They Work - and Why They Are Capable of Innovating and Truly Understanding

What (in my opinion) one can learn from Erlang/OTP for other programming languages

The Web Is for Everyone, Not Only for Humans

In Defense of Imagination: Why AI Art Is Not Theft, and What It Enables

GPU size estimation for LLMs

Also on this blog

A simple collapsible menu for mobile devices using only CSS

Another quick glance on the OpenAI API to ChatGPT using function calling

Running Ollama on a CPU-Only Headless FreeBSD 14 System

Gaussian random number generator using box-muller method