as my employer mandates #ai #agent use at work, a few observations đź§µ
-
as my employer mandates #ai #agent use at work, a few observations đź§µ
- llms are prone to wild goose chases — searching for nonexistent documents, importing nonexistent libraries, running nonexistent tools
- despite this, they are able to work independently on narrow tasks, like debugging tests or refactoring a < 10k line codebase
- without a good configuration file (allowlist the right set of tools, block the rest, recommend the most relevant tools in the agent file), they require babysitting
-
as my employer mandates #ai #agent use at work, a few observations đź§µ
- llms are prone to wild goose chases — searching for nonexistent documents, importing nonexistent libraries, running nonexistent tools
- despite this, they are able to work independently on narrow tasks, like debugging tests or refactoring a < 10k line codebase
- without a good configuration file (allowlist the right set of tools, block the rest, recommend the most relevant tools in the agent file), they require babysitting
- “context rot” happens long before a model hits its max token window size. getting nontrivial work done requires “subagents”
- none of the tools i have at the office are very good at knowing when to spawn subagents; ideally, they’d spawn much more frequently
- on days where i actually use llms, i’m less productive (when measuring in terms of business outcomes), but i produce more PRs and lines of change due to the volatile nature of llm-driven development
-
- “context rot” happens long before a model hits its max token window size. getting nontrivial work done requires “subagents”
- none of the tools i have at the office are very good at knowing when to spawn subagents; ideally, they’d spawn much more frequently
- on days where i actually use llms, i’m less productive (when measuring in terms of business outcomes), but i produce more PRs and lines of change due to the volatile nature of llm-driven development
- an individual agent is sluggish, and can take three hours to finish a task i’d complete in just one hour
- the proposed solution to this sluggishness, running many agents in parallel, doesn’t work. most of my time is spent getting organizational alignment, not programming, and when i do program i tend to have only one or two tasks
- the worst bottlenecks, as always, are cross-team agreements and tasks. these tasks are the least suited for an llm, because they require review from many humans
-
- an individual agent is sluggish, and can take three hours to finish a task i’d complete in just one hour
- the proposed solution to this sluggishness, running many agents in parallel, doesn’t work. most of my time is spent getting organizational alignment, not programming, and when i do program i tend to have only one or two tasks
- the worst bottlenecks, as always, are cross-team agreements and tasks. these tasks are the least suited for an llm, because they require review from many humans
- teams which report “success” with ai agents are greenfield internal tools with lots of cool GUI elements. they write relatively little business logic, and the cost of their mistakes is near-zero
- these teams are boosted by leadership for their “productivity,” despite the fact that they don’t deliver any tangible revenue or cost savings. their business value is negligible!
- leadership is using llms for hobby projects, and extrapolating those productivity gains across the workforce
-
- teams which report “success” with ai agents are greenfield internal tools with lots of cool GUI elements. they write relatively little business logic, and the cost of their mistakes is near-zero
- these teams are boosted by leadership for their “productivity,” despite the fact that they don’t deliver any tangible revenue or cost savings. their business value is negligible!
- leadership is using llms for hobby projects, and extrapolating those productivity gains across the workforce
- metrics like PR count and lines of code that any competent engineering manager once laughed out of the room are now presented on dashboards alongside things that actually matter, like timelines and budgets
- concern about security is near-zero, despite the fact that i work in infosec. i’ve pointed out obvious attacks on the system, only to be pulled aside and asked not to “miss the forest for the trees”
- branding ai-less automation projects as “agents” is the best way to get them funded
-
- metrics like PR count and lines of code that any competent engineering manager once laughed out of the room are now presented on dashboards alongside things that actually matter, like timelines and budgets
- concern about security is near-zero, despite the fact that i work in infosec. i’ve pointed out obvious attacks on the system, only to be pulled aside and asked not to “miss the forest for the trees”
- branding ai-less automation projects as “agents” is the best way to get them funded
- many of the metrics we used to value, like api stability, uptime, latency, and customer impact are falling by the wayside
- ai agents are part of a larger push to “lower the barrier to entry” in engineering. in practice, this means letting non-technical people “do” technology
- this is a bit like letting non-surgeons do surgery
-
- many of the metrics we used to value, like api stability, uptime, latency, and customer impact are falling by the wayside
- ai agents are part of a larger push to “lower the barrier to entry” in engineering. in practice, this means letting non-technical people “do” technology
- this is a bit like letting non-surgeons do surgery
despite everything, i see some value in llms. they are handy for debugging and prototyping, or sending off on a boring refactor. but they aren’t the 10x productivity boost many start-up guys promise. once you’re out of the bootstrapping phase, the cost of a mistake is less than the cost of a thought-out design or hand-writing a feature. that alone disqualifies the “end of engineering” argument.
-
despite everything, i see some value in llms. they are handy for debugging and prototyping, or sending off on a boring refactor. but they aren’t the 10x productivity boost many start-up guys promise. once you’re out of the bootstrapping phase, the cost of a mistake is less than the cost of a thought-out design or hand-writing a feature. that alone disqualifies the “end of engineering” argument.
many ai best practices are also engineering best practices: exhaustive documentation, good encapsulation, well-written tests. so, as our industry so often does, we’ll reinvent the right way of doing things in our attempt to supersede it. the READMEs we get along the way are positive externalities, and they’ve helped me considerably.
-
many ai best practices are also engineering best practices: exhaustive documentation, good encapsulation, well-written tests. so, as our industry so often does, we’ll reinvent the right way of doing things in our attempt to supersede it. the READMEs we get along the way are positive externalities, and they’ve helped me considerably.
in conclusion:

-
R relay@relay.infosec.exchange shared this topic