Ask HN: Anyone struggling to get value out of coding LLMs?

321 points by bjackman 2 days ago

I use LLMs daily for stuff like:

- solving tasks that just require applying knowledge ("here's a paste of my python import structure. I don't write Python often and I'm aware I'm doing something wrong here because I get this error, tell me the proper way organise the package").

- writing self-contained throwaway pieces of code ("here's a paste of my DESCRIBE TABLE output, write an SQL query to show the median [...]").

- as a debugging partner ("I can SSH to this host directly, but Ansible fails to connect with this error, what could be causing this difference").

All these use cases work great, I save a lot of time. But with the core work of writing the code that I work on, I've almost never had any success. I've tried:

- Cursor (can't remember which model, the default)

- Google's Jules

- OpenAI Codex with o4

I found in all cases that the underlying capability is clearly there (the model can understand and write code) but the end-to-end value is not at all. It could write code that _worked_, but trying to get it to generate code that I am willing to maintain and "put my name on" took longer than writing the code would have.

I had to micromanage them infinitely ("be sure to rerun the formatter, make sure all tests pass" and "please follow the coding style of the repository". "You've added irrelevant comments remove those". "You've refactored most of the file but forgot a single function"). It would take many many iterations on trivial issues, and because these iterations are slow that just meant I had to context switch a lot, which is also exhausting.

Basically it was like having an intern who has successfully learned the core skill of programming but is not really capable of good collaboration and needs to be babysat all the time.

I asked friends who are enthusiastic vibe coders and they basically said "your standards are too high".

Is the model for success here that you just say "I don't care about code quality because I don't have to maintain it because I will use LLMs for that too?" Am I just not using the tools correctly?

gyomu 2 days ago

There are two kinds of engineers.

Those who can’t stop raving about how much of a superpower LLMs are for coding, how it’s made them 100x more productive, and is unlocking things they could’ve never done before.

And those who, like you, find it to be an extremely finicky process that requires extreme amount of coddling to get average results at best.

The only thing I don’t understand is why people from the former group aren’t all utterly dominating the market and obliterating their competitors with their revolutionary products and blazing fast iteration speed.

  • Philip-J-Fry 2 days ago

    I find LLMs 100x more productive for greenfield work.

    If I want to create a React app with X amount of pages, some Redux stores, Auth, etc. then it can smash that out in minutes. I can say "now add X" and it'll do it. Generally with good results.

    But when it comes to maintaining existing systems, or adding more complicated features, or needing to know business domain details, a LLM is usually not that great for me. They're still great as a code suggestion tool, finishing lines and functions. But as far as delivering whole features, they're pretty useless once you get past the easy stuff. And you'll spend as much time directing the LLM to do this kind of this as you would just writing it yourself.

    What I tend to do is write stubbed out code in the design I like, then I'll get an LLM to just fill in the gaps.

    These people who say LLMs make them 100x more productive probably are only working on greenfield stuff and haven't got to the hard bit yet.

    Like everyone says, the first 90% is the easy bit. The last 10% is where you'll spend most of your time, and I don't see LLMs doing the hard bit that well currently.

    • joshstrange 2 days ago

      I couldn’t agree more and I’ve said the same thing many times.

      I have yet to see an LLM-generated app not collapse under it’s own weight after enough iterations/prompts. It gets stuck in loops (removing and adding the same code/concept), it gets hung up on simple errors, etc.

      For greenfield it’s amazing, no doubt, but unless you are watching it very closely and approving/reviewing the code along the way it will go off the rails. At a certain point it’s easier to add the new feature or make the modification yourself. Even if the LLM could do it, it would burn tons of money and time.

      I expect things to get better, this will not always be the state of things, but for now “vibe coding” (specifically not reviewing/writing code yourself) is not sustainable.

      Most people doing it have a github profile that is a mile wide and a meter deep.

      LLM’s are amazing and useful, but “vibe coding” with them is not sustainable currently.

      • noosphr 2 days ago

        >I expect things to get better, this will not always be the state of things, but for now “vibe coding” (specifically not reviewing/writing code yourself) is not sustainable.

        It will not.

        And I say this as someone whose been building internal LLM tools since 2021.

        The issue is their context window. If you increase the context window so they can see more code costs skyrocket as n^2 the size of the code base. If you don't then you have all the issues people have in this thread.

        The reason why I have a job right now is that you can get around this by building tooling for intelligent search that limits the overfill of each context window. This is neither easy, fast, or cheap when done at scale. Worse the problems that you have when doing this are at best very weakly related to the problems the major AI labs are focusing on currently - I've interviewed at two of the top five AI labs and none of the people I talked to cared or really understood what a _real_ agentic system that solves coding should look like.

        • FLT8 19 hours ago

          I can't help but wonder whether the solution here is something like building a multi-resolution understanding of the codebase. All the way from an architectural perspective including business context, down to code structure & layout, all the way down to what's happening in specific files and functions.

          As a human, I don't need to remember the content of every file I work on to be effective, but I do need to understand how to navigate my way around, and enough of how the codebase hangs together to be able to make good decisions about where new code belongs, when and how to refactor etc.. I'm pretty sure I don't have the memory or reading comprehension to match a computer, but I do have the ability to form context maps at different scales and switch 'resolution' depending on what I'm hoping to achieve.

        • quartzeee a day ago

          > building tooling for intelligent search that limits the overfill of each context window

          I'm interested to know what you mean by this, in our system we've been trying to compress the context but this is the first I've seen about filtering it down.

          • noosphr a day ago

            For general text you run some type of vector search against the full-text corpus to see what relevant hits there are and where. Then you feed the first round of results into a ranking/filtering system which does pair wise comparison between each chunk that you've had a good score from the vector search. Contract/expand until you've reach the limit of the context window for your model and run against the original query.

            For source code, you are even luckier since there are a lot of deterministic tools which provide solid grounding, e.g., etags, and the languages themselves enforce a hierarchical tree-like structure on the source code, viz. block statements. The above means that ranking and chunking strategies are solved already - which is a huge pain for general text.

            The vector search is then just an enrichment layer on top which brings in documentation and other soft grounding text that keeps the LLM from going berserk.

            Of course, none of the commercial offerings come even close to letting you do this well. Even the dumb version of search needs to be a self-recursive agent which comes with a good set of vector embeddings and the ability to decide if it's searched enough before it starts answering your questions.

            If you're interested drop a line on my profile email.

      • RajT88 2 days ago

        Coding LLM's: Ruby off rails?

      • bredren 2 days ago

        The way to get past the removing and adding is to have the prompt include what has already been done and what is left to do.

        Then specify the need to conclude the work by a deadline.

        These kinds of things cause the LLM to “finish” tasks and try to move on or say it is done when it is.

        This won’t let you skip the curation of output along the way, but at least some of the stumbling around can be handled with prompting.

      • cwsx a day ago

        > For greenfield it’s amazing

        I'll preface this comment with: I am a recent startup owner (so only dev, which is important) and my entire codebase has been generated via Sonnet (mostly 3.7, now using 4.0). If you actually looked at the work I'm (personally) producing, I guess I'm more of a product-owner/project-manager as I'm really just overseeing the development.

        > I have yet to see an LLM-generated app not collapse under it’s own weight after enough iterations/prompts.

        There's a few crucial steps to make an LLM-generated app maintainable (by the LLM):

        - _have a very, very strong SWE background_; ideally as a "strong" Lead Dev, _this is critical_

        - your entire workflow NEEDS to be centered around LLM-development (or even model-specific):

          - use MCPs wherever possible and make sure they're specifically configured for your project
        
          - don't write "human" documentation; use rule + reusable prompt files
        
          - you MUST do this in a *very* granular but specialized way; keep rules/prompts very small (like you would when creating tickets)
        
          - make sure rules are conditionally applied (using globs); do not auto include anything except your "system rules"
        
          - use the LLM to generate said prompts and rules; this forces consistency across prompts, very important
        
          - follow a typical agile workflow (creating epics, tickets, backlogs etc)
        
          - TESTS TESTS AND MORE TESTS; add automated tools (like linters) EVERYWHERE you can
        
          - keep your code VERY modular so the LLM can keep a focused context, rules should provide all key context (like the broader architecture); the goal is for your LLM to only need to read or interact with files related to the strict 'current task' scope
        
          - iterating on code is almost always more difficult than writing it from scratch: provided your code is well architected, no single rewrite should be larger than a regular ticket (if the ticket is too large then it needs to be split up)
        
        This is off the top of my head so it's pretty broad/messy but I can expand on my points.

        LLM-coding requires a complete overhaul of your workflow so it is tailored specifically to an LLM, not a human, but this is also a massive learning curve (that take's a lot of time to figure out and optimize). Would I bother doing this if I were still working on a team? Probably not, I don't think it would've saved me much time in a "regular" codebase. As a single developer at a startup? This is the only way I've been able to get "other startup-y" work done while also progressing the codebase - the value of being able to do multiple things at a time, let the LLM and intermittently review the output while you get to work on other things.

        The biggest tip I can give: LLMs struggle at "coding like a human" and are much better at "bad-practice" workflows (e.g. throwing away large parts of code in favour of a total rewrite) - let the LLM lead the development process, with the rules/prompts as guardrails, and try stay out of it's way while it works (instead of saying "hey X thing didn't work, go fix that now") - hold its hand but let it experiment before jumping in.

        • tmaly a day ago

          Do you have an example of a rule file? Or the MCPs you use?

          • cwsx a day ago

            MCPs:

              - `server-sequential-thinking` (MVP)
              - `memory` (2nd MVP, needs custom rules for config)
              - `context7`
              - `filesystem`
              - `fetch`
              - `postgres`
              - `git`
              - `time`
            
            
            Example rules file for ticketing system:

            ```

            # Ticket Management Guidelines

            This document outlines the standardized approach to ticket management in the <redacted> project. All team members should follow these guidelines when creating, updating, or completing tickets.

            ## Ticket Organization

            Tickets are organized by status and area in the following structure:

            TICKETS/ COMPLETED/ - Finished tickets BACKEND/ - Backend-related tickets FRONTEND/ - Frontend-related tickets IN_PROGRESS/ - Tickets currently being worked on BACKEND/ FRONTEND/ BACKLOG/ - Tickets planned but not yet started BACKEND/ FRONTEND/

            ## Ticket Status Indicators

            All tickets must use consistent status indicators:

            - *BACKLOG* - Planned but not yet started - *IN_PROGRESS* - Currently being implemented - *COMPLETED* - Implementation is finished - *ABANDONED* - Work was stopped and will not continue

            ## Required Ticket Files

            Each ticket directory must contain these files:

            1. *Main Ticket File* (TICKET_.md): - Problem statement and background - Detailed analysis - Implementation plan - Acceptance criteria

            2. *Implementation Plan* (IMPLEMENTATION_PLAN.md): - Detailed breakdown of tasks - Timeline estimates - Success metrics

            3. *Implementation Progress* (IMPLEMENTATION_PROGRESS.md): - Status updates - Issues encountered - Decisions made

            4. *Design Documentation* (DESIGN_RECOMMENDATIONS.md), when relevant: - Architecture recommendations - Code patterns and examples - Error handling strategies

            5. *API Documentation* (API_DOCUMENTATION.md), when applicable: - Interface definitions - Usage examples - Configuration options

            ## Ticket Workflow Rules

            ### Creating Tickets

            1. Create tickets in the appropriate BACKLOG directory 2. Use standard templates from .templates/ticket_template.md 3. Set status to *Status: BACKLOG* 4. Update the TICKET_INDEX.md file

            ### Updating Tickets

            1. Move tickets to the appropriate status directory when status changes 2. Update the status indicator in the main ticket file 3. Update the "Last Updated" date when making significant changes 4. Document progress in IMPLEMENTATION_PROGRESS.md 5. Check off completed tasks in IMPLEMENTATION_PLAN.md

            ### Completing Tickets

            1. Ensure all acceptance criteria are met 2. Move the ticket to the COMPLETED directory 3. Set status to *Status: COMPLETED* 4. Update the TICKET_INDEX.md file 5. Create a completion summary in the main ticket file

            ### Abandoning Tickets

            1. Document reasons for abandonment 2. Move to COMPLETED/ABANDONED directory 3. Set status to *Status: ABANDONED* 4. Update the TICKET_INDEX.md file

            ## Ticket Linking

            When referencing other tickets, use relative links with appropriate paths:

            markdown @TICKET_NAME

            Ensure all links are updated when tickets change status.

            ## Ticket Cleanup and Streamlining

            ### When to Streamline Tickets

            Tickets should be streamlined and cleaned up at major transition points to maintain focus on remaining work:

            1. *Major Phase Transitions* - When moving between phases (e.g., from implementation to testing) 2. *Milestone Achievements* - After completing significant portions of work (e.g., 80%+ complete) 3. *Infrastructure Readiness* - When moving from setup/building to operational phases 4. *Team Handoffs* - When different team members will be taking over the work

            ### What to Streamline

            *Replace Historical Implementation Details With:* - Brief completed tasks checklist ( high-level achievements) - Current status summary - Forward-focused remaining work

            *Remove or Simplify:* - Detailed session-by-session progress logs - Extensive implementation decision histories - Verbose research findings documentation - Historical status updates and coordination notes

            ### Why Streamline Tickets

            1. *Git History Preservation* - All detailed progress, decisions, and implementation details are preserved in git commits 2. *Clarity for Future Work* - Makes it easier to quickly understand "what needs to be done next" 3. *Team Efficiency* - Anyone picking up the work can immediately see current state and next steps 4. *Maintainability* - Shorter, focused tickets are easier to read, understand, and keep updated

            ### How to Streamline

            1. *Archive Detailed Progress* - Historical implementation details are preserved in git history 2. *Create Completion Summary* - Replace detailed progress with a brief "What's Complete" checklist 3. *Focus on Remaining Work* - Make current and future phases the primary content 4. *Update Status Sections* - Keep status concise and action-oriented 5. *Preserve Essential Context* - Keep architectural decisions, constraints, and key requirements

            *Goal*: Transform tickets from "implementation logs" into "actionable work plans" while preserving essential context.

            ## Maintenance Requirements

            1. Keep the TICKET_INDEX.md file up to date 2. Update "Last Updated" dates when making significant changes 3. Ensure all ticket files follow the standardized format 4. Include links between related tickets in both directions

            ## Complete Documentation

            For detailed instructions on working with tickets, refer to:

            - @Ticket Workflow Guide - @Ticket Index - @Tickets README

            ```

    • virgilp 2 days ago

      I've recently been able to use LLM on a large-ish internal project to find a bug. The prompt took the form of "here's the symptoms I observe, and some hypothesis, tell me where the code that handles this case is written" (it was a brand new repo that I hadn't looked at before - code written by a different team, that were claiming some weird race condition/ were not really willing to look into the bug). Basically I was asking the LLM to tell me where to look, and it actually found the issue itself.

      Not 100x more productive, that's an exaggeration... not even 10x. But it helps. It is an extremely competent rubber duck [1].

      [1] https://en.wikipedia.org/wiki/Rubber_duck_debugging

      • tormeh 2 days ago

        I too did this (although on a small project), and I was incredibly impressed. My problem with it is that I first did it myself, and it was fairly quick and easy. The hard part was figuring out that there was a bug, and how exactly the bug behaved. The LLM helped with the easy part, but I don't know how to even explain the difficult part to it. There was no way to know which repo the problem is in, or that it wasn't a user error.

    • insane_dreamer 2 days ago

      > create a React app with X amount of pages, some Redux stores, Auth, etc.

      Unless you're a contractor making basic websites for small businesses, how many of these do you need to make? This really a small fraction of the job of most developers, except for entry-level devs.

      > when it comes to maintaining existing systems, or adding more complicated features, or needing to know business domain details,

      This is what experienced developers will spend 90% of their time doing.

      So yes, LLMs can replace entry-level devs, but not the more experienced ones.

      This begs the question: if companies stop hiring entry-level devs because LLMs can do their job, how will new devs get experience?

      • tkiolp4 2 days ago

        Your conclusion is wrong I think. LLMs cannot magically replace entry levels devs. Who’s gonna ask the LLM to create the basic website? The product owner? The accountant? The sales guy? They wouldn’t know how to be precise enough to state what they actually need. An entry level engineer would make use of the LLM to produce the website and push it to production. Hell, only engineers know that the devil is in the details. Quick example: let’s say a Contact Us page needs to be built. There are tons of details that need to be accounted for and the LLM may skip them if it is not told about them: where does the data of the form go to? A backend endpoint? What about captcha? What about analytics? What about validation of specific fields? What about the friendly URL? And disabling the button after sending to prevent duplicate requests?

        An LLM is very capable of implementing all of that… if only someone who knows all of that stuff tell them first.

        And most importantly: LLMs don’t challenge the task given. Engineers do. Many times, problems are solved without code.

        • jimbokun 2 days ago

          A lot of the value a good engineer provides is saying “you don’t want to do that, it’s a bad idea for these reasons.” Or “that’s actually easier than you think. We could do it this way.”

          Knowing what’s possible, difficult, easy, risky, cheap, expensive, etc.

        • insane_dreamer a day ago

          > An LLM is very capable of implementing all of that… if only someone who knows all of that stuff tell them first.

          I agree with you, but I don't think it's the entry-level dev who is going to be interfacing with the client to discuss and resolve all the questions you posed, and/or decide on them. That was part of the OP's point -- that much of their time is spent interfacing with the client to very precisely determine the requirements.

    • gedy 2 days ago

      > If I want to create a React app with X amount of pages, some Redux stores, Auth, etc. then it can smash that out in minutes. I can say "now add X" and it'll do it. Generally with good results.

      Not discounting your experience, but a lot of these examples are about frameworks that never had good bootstrapping, such as Rails does/did. LLMs are really good at boilerplate, but maybe this points to these such stacks needing too much fiddling to get going, vs 10x coder AI.

      • Jensson 2 days ago

        You can get similar results by downloading a boilerplate template project for that though.

        LLM makes that a bit easier and faster to do, but is also a bit more error prone than a template project.

        • cowsandmilk 2 days ago

          I’m not sure that LLM makes it easier to do. Pain points I’ve seen:

          1. You have to remember all the technologies you need included, my company template already has them.

          2. LLM doesn’t have a standardized directory structure, so you end up with different projects having different structures and file naming conventions. This makes later refactoring or upgrades across multiple projects less automatable (sometimes this can be solved by having an LLM do those, but they often are unsuccessful in some projects still)

          3. LLMs have a knowledge cutoff. If your company has already moved to a version after that knowledge cutoff, you need to upgrade the LLM generated code.

          I very much prefer having a company template to asking an LLM to generate the initial project.

          • gedy 2 days ago

            I agree, initial comment was basically that, and seeing a lot of folks (especially those with debatable technical skills) being very impressed with LLMs for boilerplate generation, e.g. "I build a WHOLE APP", etc.

            • skydhash 2 days ago

              Maybe they haven't use the wizard in IDEs like Intellij and Visual Studio. You can boostrap things so quickly that you don't even think about it, just like creating a new file in the editor.

    • sph 2 days ago

      > I find LLMs 100x more productive for greenfield work.

      Greenfield != boilerplate and basic CRUD app.

      I'm a consultant writing greenfield apps solo, and 90% of my time is spent away from my editor thinking, planning, designing, meeting with stakeholders. I see no benefit in using a low-IQ autocomplete tool to automate a small part of the remaining 10% of the job, the easiest and most enjoyable part in fact.

      Personally I find most of coding I do is unsuitable for LLMs anyway, because I don't need them to regurgitate standard logic when libraries are available, so most of that 10% is writing business logic tailored for the program/client.

      Call me elitist (I don't care) but LLMs are mostly useful to two kinds of people: inexperienced developers, and those that think that hard problems are solved with more code. After almost two decades writing software, I find I need fewer and fewer code to ship a new project, most of my worth is thinking hard away from a keyboard. I really don't see the use of a machine that egregiously and happily writes a ton of code. Less is more, and I appreciate programming-as-an-art rather than being a code monkey paid by the line of code I commit.

      Disclaimer: I am anti-LLM by choice so my bias is opposite than most of HN.

      • s_ting765 2 days ago

        I completely agree with you that the coders who are "smitten" by LLMs are just inexperienced. I personally find that LLMs get subtle improvements in capability over time and it's usually worthwhile to check in on the progress from time to time. Even if it's just for fun.

        • l33tman 2 days ago

          I can see why you would chime in to say that in your experience you don't get any value out of it, but to chime in to say that the millions of people who do are "inexperienced" is pretty offensive. In the hands of skilled developers these tools are a complete gamechanger.

          • bluefirebrand 2 days ago

            > In the hands of skilled developers these tools are a complete gamechanger.

            This is where both sides are basically just accusing the other of not getting it

            The AI coders are saying "These tools are a gamechanger in the hands of skilled developers" implying if you aren't getting gamechanging results you aren't skilled

            The non-AI coders are basically saying the same thing back to them. "You only think this is gamechanging because you aren't skilled enough to realize how bad they are"

            Personally, I've tried to use LLMs for coding quite a bit and found them really lacking

            If people are finding a lot of success with them, either I'm using them wrong and other people have figured out a better way, or their standards are way, way lower than mine, or maybe they wind up spending just as long fixing the broken code as it would take me to write it

          • cjaybo 2 days ago

            Are these “millions of people” in the room with us now?

          • codr7 2 days ago

            Why?

            What does it add that you couldn't have written yourself faster if you're so skilled?

            • SpaceNugget 2 days ago

              > if you're so skilled?

              I think this is needlessly snarky and also presupposes something that wasn't said. No one said it can write something that the developer couldn't write (faster) themselves. Tab complete and refactoring tools in your IDE/editor don't do anything you can't write on your own but it's hard to argue that they don't increase productivity.

              I have only used cline for about a week, but honestly I find it useful in a (imo badly organized) codebase at work as an auto-grepper. Just asking it "Where does the check for X take place" where there's tons of inheritance and auto-constructor magic in a codebase I rarely touch, it does a pretty good job of showing me the flow of logic.

          • mattmanser 19 hours ago

            Where are these millions, and where is their output? You're in an echo chamber mate, there aren't millions of people using AI to do significant amounts of work.

            Indie hackers just did an article on 4 vibe coded Startups and they all seem like a joke.

            And they could only find 4!

            I didn't look at them all, but the flight sim is spectacularly bad, the revenue numbers obviously unsustainable and it looks like something moderately motivated school children might have made for a school project in a week.

      • bigtex 2 days ago

        You shared my sentiments exactly. What none of these "all junior developers will be out of a job by 2026" proclamations never deal with is the non coding stuff. Sure it can generate a boiler plate app in 1.5 seconds, but can in communicate with a stakeholder about the requirements and ask the right questions to determine the scope and importance of features.? I just imagine my Sales President spending more 90 seconds trying to write the proper prompt before he gives up and calls and talks to a human. There is just no way a C-suite executive is sitting in front of a computer typing in prompts.

        • ungreased0675 2 days ago

          I agree with the larger point you’re making, but an LLM absolutely can ask the right questions to determine scope and features for a basic application. It can even turn those into decent user stories. It’s the edge cases and little details that will be messed up.

      • macNchz 2 days ago

        Where I've found LLM coding assistants really effective in my client consulting work is around iteration: I can deliver so many more versions of an application UI in a given amount of time that it really changes the entire process of how we dig into a project. Where I might previously have wanted to start with low fidelity wireframes and go through approvals to avoid proto-duction and pain down the line when the client didn't like something, now we can rough out the whole thing in a functional proof of concept and then make sweeping changes live on a call as we test different interaction paradigms.

      • ddingus 2 days ago

        Your disclaimer adds value. Well done and appreciated!

    • zmgsabst 2 days ago

      I don’t find the same, eg, greenfield AI projects.

      It can do pieces in isolation, but requires significant handholding on refactors to get things correct (ie, it’s initial version has subtle bugs) — and sometimes requires me to read the docs to find the right function because it hallucinates things from other frameworks will work in its code.

    • tmaly a day ago

      I have not seen much discussion on how to properly work with legacy code using LLMs.

      Michael Feathers book comes to mind when thinking about the topic. One gets the idea that you have to write a lot of tests. But what happens when there are no tests, comments, documents etc?

    • jimbokun 2 days ago

      To me that makes coding LLMs the new Excel.

      I mean that in the sense that Excel is the tool that non developers could take the farthest to meet their needs without actually becoming a full time developer. But then to take it past that point a professional developer needs to step in.

      I imagine non devs vibe coding their way to solutions far more complex than Excel can handle. But once they get past the greenfield vibe coding stage they will need a pro to maintain it, scale it, secure it, operationalize it, etc.

    • wejick 2 days ago

      Not really my case. Found that Codebase with good code benefit more with LLM, but it's not the prerequisite.

      I just rewrote 300ish advanced PostgreSQL queries to mysql queries. The process is not magical, but it will take me 1 week rather than 3 days. Now I'm on testing phase, seems promising.

      The point is if we can find a work to work along with the agent, can be very productive.

      • daveguy 2 days ago

        > I just rewrote 300ish advanced PostgreSQL queries to mysql queries.

        Translation from one set of tokens to another is exactly the primary use case of LLMs. This is exactly what it should be good at. Developing new queries, much less so. Translation from one set of database queries to another was already very well defined and well covered before LLMs came about.

        • skydhash 2 days ago

          Also, both are formal grammar so if you really wanted to create a 1:1 translator, it's possible to do so (see virtual machines). But it's not as useful per se as no one really switch databases on a whim. If you really want to do so, you want to do it correctly.

        • theappsecguy a day ago

          I mean, using a good ORM with DB adapter options could achieve this in minutes. Sure, LLM has utility here for raw queries, hardly "replace SWEs" type of utility though

          • DonHopkins a day ago

            A good reason to have "300ish advanced PostgreSQL queries" is because an "ORM with DB adapter" is inadequate for your requirements.

      • bonki 15 hours ago

        Why did you not use an SQL transpiler?

    • elfly 2 days ago

      yeah, this is the issue. I've used Claude Code to great success to start a project. Once the basic framework is in place, it becomes less and less useful. I think it cannot handle the big context of a full project.

      It is something that future versions could fix, if the context a llm can handle grows and also if you could fix it so it could handle debugging itself. Right now it can do it for short burst and it is not bad at it, but it will get distracted quickly and do other things I did not ask for

      One of these problems has a technical fix that is only limited by money; the other does not

  • michaelrpeskin 2 days ago

    A little snarky but: In my experience, the folks who are 100x more productive are multiplying 100 times a small number.

    I've found great success with LLMs in the research phase of coding. Last week I needed to write some domain-specific linear algebra and because of some other restrictions, I couldn't just pull in LAPACK. So I had to hand code the work (yes, I know you shouldn't hand code this kind of stuff, but it was a small slice and the domain didn't require the fully-optimized LAPACK stuff). I used an LLM to do the research part that I normally would have had to resort to a couple of math texts to fully understand. So in that case it did make me 100x more effective because it found what I needed and summarized it so that I could convert it to code really quickly.

    For the fun of it, I did ask the LLM to generate the code for me too, and it made very subtle mistakes that wouldn't have been obvious unless you were already an expert in the field. I could see how a junior engineer would have been impressed by it and probably just check it in and go on.

    I'm still a firm believer in understanding every bit of code you check in, so even if LLMs get really good, the "code writing" part of my work probably won't ever get faster. But for figuring out what code to write - I think LLMs will make people much faster. The research and summarize part is amazing.

    The real value in the world is synthesis and novel ideas. And maybe I'm a luddite, but I still think that takes human creativity. LLMs will be a critical support structure, but I'm not sold on them actually writing high-value code.

    • ryandrake 2 days ago

      To steal the joke from that vintage coffee tin: LLMs help you "do stupid things faster." If all you want to do is dump terrible code into the world as fast as possible, then you've got a great tool there. If you're looking to slowly and carefully build a quality product on a solid foundation, or create a programming work of art, LLMs are at best a marginally better autocomplete.

    • data-ottawa 2 days ago

      They definitely help raise the floor, but have a much lower ceiling.

      If you’re near that ceiling you get most value out of code reviews and those side features you don’t care about that allow you to focus more on the actual problem.

      That’s a 10-20% lift instead of 10/100x.

    • Brystephor 2 days ago

      > I've found great success with LLMs in the research phase of coding.

      This is what I've found it most helpful for. Typically I want an example specific to my scenario and use an LLM to generate the scenario that I ask questions about. It helps me go from understanding a process at a high level, to learning more about what components are involved at a lower level which let's me then go do more research on those components elsewhere.

  • loveparade 2 days ago

    That's because the reality is somewhere in the middle. It obviously isn't 100x or even 10x except for very specific toy tasks, but for me it's probably around ~1.5-2x in everyday work. And I work on mostly long-tail scientific stuff, I imagine it's more if you do something like frontend dev with a lot of boilerplate. I'm absolutely convinced that people who say LLMs are not making them more productive either don't understand how to use them correctly, or they are working on very specific niche problems/infrastructure where they aren't useful.

    2x may not sound like much compared to what you read in the media, but if a few years ago you had told companies that you can provably make their engineers even 2x more productive on average you'd probably be swimming in billions of dollars now. That kind of productivity increase is absolutely huge.

    • bluefirebrand 2 days ago

      > I'm absolutely convinced that people who say LLMs are not making them more productive either don't understand how to use them correctly, or they are working on very specific niche problems/infrastructure where they aren't useful

      I understand how to use LLMs, it's just a much worse workflow. Writing code isn't the hard part, reviewing code is much slower and painful process than writing it from scratch, when you actually know what you're doing.

      I'm absolutely convinced that people who are saying LLMs are making them more productive either weren't very productive to begin with or they don't know what they're doing enough to realize what a bad job the LLM is doing

    • FiberBundle 2 days ago

      Where's the actual evidence for the productivity boost though? Wouldn't one expect a huge increase in valuable software products or a dramatic increase in open source contributions if llms provide this kind of productivity increase?

    • theappsecguy a day ago

      It's really hard to measure. Sure, maybe 20-30% of the time I get a perfect autocomplete and boom, 2x speedup. Then there is the partially correct or completely incorrect suggestions that tools like Cursor keep spamming, making me either accept by mistake or continuously rage hit the escape key, which both slows things down and breaks the thinking flow.

      How do you account for the latter? It's nearly impossible and I have seen no improvement in that realm. As the good suggestions have become better, the ratio of good to bad hasn't really changed for me.

    • mlsu 2 days ago

      If it really is 2x in general, we would have noticed.

      LLM are useful to me personally as a learning tool. It’s valuable to have it talk through something or explain why something is designed a certain way. So it’s definitely helped me grow as an engineer. But I have not come close to being 2x as productive, I don’t think my workload would be done by 2 people in the pre-LLM era. Maybe 1.25?

      Because for the most part being a computer programmer is not about programming the computer but about… everything else.

      Overall, LLM are a net negative for humanity in my opinion. The amount of slop on the internet now is absurd. I can never be sure if I’m reading the writing of a real person or a machine, which makes me feel like I’m slowly becoming brain poisoned by the devil himself on a psychological level. I have to think twice when I read any email longer than 2 paragraphs, or did my coworker start using more em dashes?

      And also if it’s so damn easy for a machine to do all of this complex valuable work for a few kilojoules how complex or valuable is it anyway? Maybe it’s all just a fiction, that any of this stuff provides any value to anyone. Is another app really actually necessary? What about a hundred million new apps?! Even if we all became 100x as “productive,” what would that actually mean, in the real world you know the one that’s outside that has smells and textures and objects and people in it talking to one another face to face with their wet vocal chords??

      These things are going to drive us all absolutely insane I can guarantee it. The logical end of this is complete madness.

    • tonyhart7 2 days ago

      "That kind of productivity increase is absolutely huge."

      then company decide to hire less, the market flooded with junior AI powered skill, mid level programmer cant asking rise as easy anymore unless you are senior dealing with really specific stuff that need know how

    • codr7 2 days ago

      2x is also .5x in learning, nothing is free in this world.

  • jghn 2 days ago

    > There are two kinds of engineers.

    Apparently there are at least three as I fit neither of these molds of yours. They are neither making me 100x more productive, nor am I putting in an extreme amount of coddling.

    For context, in the 30ish years I've been writing code professionally, I've *always* needed to look stuff up constantly. I know what I want, and I know what's possible. I just can never remember things like syntax, specifics, that sort of thing. It doesn't help that I often wind up in jobs where I'm switching between languages often.

    Before quality search engines like Google, this meant I was always flipping through reference books, man pages, and worse. Then came search engines and that was a step more productive. Then came things like SO, and that was a step more productive. I find LLMs to be another step forward, but not like a quantum leap or anything. Take autocomplete suggestions for instance: Sometimes it's exactly the prompt (pardon the pun) that I need to jog my memory. Other times I know it's not what i want and I ignore it. And the chat interfaces I find better than Googling SO as I can have a back & forth with it.

    • mattfca 2 days ago

      this describes me and my experience exactly

  • melvinroest 2 days ago

    I'm learning math through Math Academy, ChatGPT and YouTube (mostly 3Blue1Brown and similar channels). Without that specifc combination, it would've been hell. Now it's just nice.

    A few years I did it with something similar to Math Academy (from the Universiteit of Amsterdam). ChatGPT wasn't intelligent enough back then so I didn't use it. It felt a lot tougher.

    ChatGPT answers questions that a teacher would find obscure nonsense, but I'm the type of student that needs to know the culture behind math so I can empathize with the people doing it and can think of solutions via that way. Like why is the letter m used when talked about an angle? Mathematically, it shouldn't matter. But it irritates me as I'd use a instead, it makes more sense as you want to lower the cognitive burden and focus as much on the math as possible. So I ask ChatGPT and one of the things it said is that historically it meant "measure". Now, I get it and can focus on the actual math again.

    Another example is the fast variance calculation: average_of_squares - square_of_the_average

    how does that come from (1/n) * sum((x - mean)^2) ?

    It just shows it.

    World domination? Nope, but I'm learning something I otherwise wouldn't. But as you can tell, YouTube and Math Academy play their own roll in it.

    • layer8 2 days ago

      > why is the letter m used when talked about an angle?

      This isn’t the best example, because a simple Google search gives you immediate results for that as well, and arguably more nuanced ones. You’ll learn, for example, that m is also the standard letter for slope, which may or may not be related.

      Also, if a teacher finds that question “obscure nonsense”, I’d argue that they are a bad teacher.

      • melvinroest a day ago

        Hmm, it's not the best example yea. It's just that ChatGPT understands math notation a lot better than Google. If needed, you can always let ChatGPT do a Bing search.

    • elzbardico 2 days ago

      You'd love a very old book named Mathematics for the Million. The author was a committed Marxist and an Engineer, and due to this combination, it places a strong emphasis on demystifying concepts, the historical context of mathematical ideas and its practical application.

      Of course, some of it is a bit strange for modern audience, the way lots of things in calculus are taught nowadays is completely different. If you are looking for rigorous formal proofs, you won't find them on this book.

      But it is a quite funny and quirk book that gives you a different perspective on math concepts and will satisfy your hunger for knowing the people and the historical context behind it.

    • namaria 2 days ago

      > So I ask ChatGPT and one of the things it said is that historically it meant "measure"

      This is a great example of why using LLMs to learn is perilous. This is nonsense.

      • melvinroest 2 days ago

        Is it? I didn't bother to check as it wasn't too relevant. I wasn't stuck anymore. Worst-case it's a memnonic that is untrue. Best case: it is. For actual derivations I check the work.

        This is what it wrote by the way.

        I posted an image and asked:

        Why is this m

        m for angle mangle?

        ChatGPT:

        In geometry we often need to distinguish the angle itself (a geometric object) from the numerical size of that angle (a real-number measurement, usually in degrees or radians). To keep the two ideas separate we put a little “m” in front of the angle symbol:

        m∠B reads as “the measure of angle B.”

        * ∠B refers to the actual wedge-shaped region determined by the two rays that meet at vertex .

        * m∠B is the single number—e.g. \90∘—that tells you how “wide” that wedge is.

        So in the worked example the author writes

        m∠A + m∠B + m∠C + m∠D = 360∘

        because they are adding the sizes of the four interior angles of a quadrilateral, not the angles themselves.

        If you dropped the m you’d be adding geometric objects, which doesn’t make sense; the m reminds the reader that what’s being added are real-number measures. (Some textbooks use ∣∠B∣ or simply write the letter with a bar over it, but “m∠” is by far the most common in U.S. and many international curricula.)

        So the mysterious “m” isn’t another variable—just a shorthand for “measure of.”

        ---

        Edit: this took quite some work to write well as ChatGPT answers don't copy/paste neatly into HN. So I won't be doing this for the other example.

        • ants_everywhere 2 days ago

          ChatGPT is right, although I'm not sure how historical the notation is.

          ∠ is traditionally a function from points to axiomatic geometric objects. ∠ABC is the angle at B oriented so that we start at A, go to B, then to C.

          Your text seems to be using ∠ either as a kind of type annotation (indicating by ∠B that B is an angle) or (perhaps more likely) is just suppressing the other letters in the triangle and is short for something like ∠ABC.

          Since ∠B is an axiomatic Euclidean object, it has no particular relation to the real numbers. m is an operator or function that maps axiomatic angles to real numbers in such a way that the calculations with real numbers provide a model for the Euclidean geometry. Why call it m? I'm not aware of it being historical, but almost certainly it comes from measure, like the μ in measure theory.

          Obviously ∠ is a graphical depiction of an angle, and my guess is it probably evolved as a shorthand from the more explicit diagrams in Euclid.

          Traditionally angles are named with variables from the beginning of the Greek alphabet: α, β, γ. Then we skip to θ presumably to avoid the Greek letters that look nearly identical to Roman letters.

          • melvinroest 2 days ago

            "I'm not sure how historical the notation is."

            I conflated this with another ChatGPT conversation where it gave 3 possible historical sources for another symbol that I fell over and then had trouble proceeding.

        • card_zero 2 days ago

          See slope: https://en.wikipedia.org/wiki/Slope

          It isn't customarily used for angles (those get Greek letters).

          The m stands for mystery.

          Edit: ah, but I see that this prefixed m for "measure" is also used sometimes. It appears at https://en.wikipedia.org/wiki/Angle#Combining_angle_pairs though I'm not sure why it's necessary. Maybe because you want to clarify absolute magnitude is meant, to avoid adding negative values.

          • evertedsphere 2 days ago

            yes, it's the equivalent of |AB| for a line segment and is not uncommon in high school maths texts in some parts of the world

        • namaria 2 days ago

          I thought it was a different kind of nonsense, but it still has a subtle error. Goes to show even more how risky it is to repeat LLM answers as factual.

          • melvinroest 2 days ago

            Could you point out what you mean? It's really hard to follow you. You say it's nonsense but it is not clear why. Then I write down a response that took me quite a while to format correctly (and to look up) and you then say "I thought it was a different kind of nonsense." Then you say it still has a subtle error.

            It is really hard to follow you if you don't explain yourself.

            I'm not saying it's factual. The reason I showed that answer was simply to verify to see if it was what you thought it was (hence I asked "is it?"). It turns out that it wasn't fully.

      • wolfhumble 2 days ago

        > This is a great example of why using LLMs to learn is perilous. This is nonsense.

        These type of answers from teachers, co-students, web communities, blogs etc. are – I would assume – why people ask LLMs in the first place.

        • namaria 2 days ago

          It is a problem that people who are unwilling to perform some basic research resort to 'learning' the output of LLMs. No one is entitled to answers.

          • melvinroest 2 days ago

            "No one is entitled to answers" feels very definitive, defeating and tiring. Especially because you don't explain your own thought process.

            Could you please assume a good faith discussion?

            • namaria a day ago

              I have. I was criticized for pointing out spurious nonsense in LLM slop by someone who claimed people wouldn't have to resort to it if other people made an effort to explain things better.

              But I don't believe anyone is entitled to an explanation. I find things out by looking up books and testing things. Any explanation someone deigns to give me is a bonus and doubted until corroborated.

              I don't know why anyone would think they are owed a custom explanation for their specific questions and thinking like that will get you in trouble when you come to depend on what anyone (or anything) is willing to chew up for you.

              Maybe I was terse but I don't think I was rude or illogical.

              • melvinroest a day ago

                > I was criticized for pointing out spurious nonsense in LLM slop

                I can see that you experience it as such but I think it's more of a spectrum. Often times, LLMs give good answers. Often enough times they don't. One needs to keep that in mind. In my example, given that it was just a symbol, all I needed was knowledge at the level of a memnonic which would, on average, at least somewhat also point directionally to the truth. But that's a bonus. I could make up a memnonic myself, but I like having that bonus.

                Given that ChatGPT is directionally towards the truth, but not fully (on average), I'd need to test it or verify it if I want a better level of knowledge than that. If that's the case, then ChatGPT is basically acts as a sort of cache as it's quicker to ask a question to ChatGPT than to research on one's own. One can experience a cache hit or cache miss. Such a thing will happen in the verification stage. Specifically, for math this is quicker, in my experience.

                But anyways that's my experience. Your experience is that it's spurious nonsense slop. And I suppose you therefore find it a problem. I don't see the issue as there are different levels of knowledge and different time commitments you need to give to them. A lot of my knowledge is based on trust anyway and sometimes it's broken (e.g. replication crisis in psychology, I felt betrayed having studied the field).

                > I don't know why anyone would think they are owed a custom explanation for their specific questions

                I'm not sure if anyone said anything like it. Regardless of that, the need still exists. People will still act on that need. I suspect you see that as a problem. I'm neutral on it.

                > thinking like that will get you in trouble when you come to depend on what anyone (or anything) is willing to chew up for you.

                IMO teaching and learning is a 2 way street. It's the teacher's job to explain it well enough. It's the student's job to do their best to understand it. Math Academy offers exercises and explanations. Sometimes I find their explanations a bit lacking. So I use other sources to augment it.

                > Maybe I was terse but I don't think I was rude or illogical.

                Reading/writing text is tough, which is why I stated how I felt. It'd probably have been easier in an actual conversation. I didn't mean to imply you were being rude or illogical.

  • lm28469 2 days ago

    That's because they fall under the "engineers" category like my local postman and Michael Schumacher both fall under the "driver" category.

    Most people don't work on anything technically hard, most problems are business logic issues that aren't solved technically or legacy code workarounds for which you need to put 3-10 domain experts in a room for a few hours to solve.

    • sunrunner 2 days ago

      There are a lot of terms in software development that have been co-opted from other disciplines and misrepresent a lot of development work, including 'engineering' and 'architecture'.

      I think it's helpful to think of engineering as a _process_ instead of a role, and the reality is that a lot of development work doesn't necessarily rely on the strong engineering methodology (e.g. measurement, material properties, tolerances, modelling, etc.) that the people developing the software might imagine just based on the number of job adverts for 'engineers'.

      This isn't a bad thing. There are hundreds or thousands of different but equally valid solutions to getting a program to do a thing, but not recognising that most code writing is somewhere between a art and engineering and is neither a purely artistic discipline but also rarely a purely engineering one is useful.

      The kinds of engineering and architecture that people think of in software development only really represent common practices and shared language (e.g. design patterns, architectural patterns) and not a strong engineering practice or any kind of truth about how software actually runs.

      (Unless you're writing software for launching rockets, in which case the engineering _process_ porbably should be strong).

      • cassianoleal 2 days ago

        > the reality is that a lot of development work doesn't necessarily rely on the strong engineering methodology (e.g. measurement, material properties, tolerances, modelling, etc.)

        It's probably true that a lot of development work doesn't rely on those. It's probably also true that the work other kinds of engineers do also don't.

        That said, when engineering software systems, those are very important. Measurement: resource sizing, observability; tolerances: backoffs, back pressure, queues, topics, buffers; modelling: types, syntax, data analytics...

        There's a whole class of developers out there that are not aware or very good at those. And that's fine. There's a place for them in the market. You don't need an engineer to work on your floor joists or your plumbing. Sure you can have one, but you can also hire a builder or DIY it all yourself.

    • voidhorse 2 days ago

      Completely agree. If anything LLMs have just made this completely transparent and are revealing that actual engineering in software is limited to rare cases.

      All of theses middling developers who are so excited about these tools don't seem to realize that they are perfectly primed to eventually eliminate precisely the kind of worker responsible for hooking up existing APIs for solved problems into a web app, the kind of work the market hitherto greatly over esteemed and overpaid for isn't my going to be so esteemed or highly valued anymore. The only real work that will remain is actual hard engineering work (solving novel technical modeling problems, not just plugging apis together). All of these lazy devs are hyped for precisely the automation that's goin. to significantly reduce their pay and labor prospects in the long term lol. I'm shocked at times at how people can fail to see the writing on the wall when it's been written in gargantuan red ink

      • esafak 2 days ago

        Software engineering work never ends, so they'll just work faster.

    • nobodywillobsrv 2 days ago

      So true and it makes me feel better to have this mental model. I find it useful for simple things that are perhaps just beyond search and copy paste but sometimes wonder if the more technical stuff is not even quite right as it is possible to sort of brainwash yourself into not thinking clearly when nagging an llm into being reasonable on something it has a hard time with.

    • mbac32768 2 days ago

      Right yes, I imagine most people who program are in companies that have nothing to do with high tech, writing SQL or scripts or CRUD apps. LLMs are probably amazing here.

      • t917910 2 days ago

        Writing SQL in the context of 20 years old application with 2500 stored procedures just inside one of many databases is not so easy for LLM.

  • SatvikBeri 2 days ago

    I find LLMs make me about 10% more productive – which is great, but nowhere near as extreme as some claims.

    Cursor autocompletes a lot of stuff I would have typed anyway, and LLM search is a strong complement to Google, though not a direct replacement. Generating more than a few lines usually gives me bad code, but often suggests libraries I didn't know about, especially outside my area of expertise.

    There's just not a lot of people talking about middling results because the extremes tend to eat up the conversation.

    • codr7 2 days ago

      It's not a free lunch though, you're sacrificing reinforcement learning and becoming more dependent on your tools.

    • jimbokun 2 days ago

      I do find AI a replacement to Google most of the time, because now when I search on Google the Gemini results at the top generally give me what I wanted to know.

  • slightwinder 2 days ago

    Even if someone seriously works 100 times faster with AI, it doesn't mean they can easily dominate any established market. A product is more than just code, and a company is more than their headcount. There is domain knowledge about the product and market. There is the personal ability of the workers and the synergy coming with, also the strength of established structures grown over years and decades. Just the habit and trust of users is a big fortress every new product has to conquer. That's not something some random dev or team of devs can easily reach just by being 100 times faster.

    And realistically, there is also the question of where this claim is coming from. Someone being 100 times faster with AI is probably not starting from a place of high competence. Of the 100x, 90x are probably just for filling the gap to peak devs. So at the end they would be probably as productive as a highly competent dev of some domain, just that they can more easily access any domain without years of experience in that domain. And if we are honest, we somewhat have that already without AI, just by people copying and cookie cutting any sh*t

  • GenerocUsername 2 days ago

    At a macro level, people from the former group might be doing better than you think.

    I have gone from Senior II, to Lead Architect, to Principal Engineer in the years LLMs have been available.

    It's not literally a 100x multiplier, but I have leveraged their ability to both fill in details where I lack specific knowledge, and as a first point of contact when researching a new space.

    LLMs have enabled me to widen my sphere of influence by deepening my abilities at every point of my T-shaped knowledge curve.

    • codr7 2 days ago

      Except you're not deepening anything, rather becoming increasingly dependent on wacky tools to keep up your fake status. Go for it, but it's a recipe for disaster imo.

  • jf22 3 hours ago

    There is a gradient in here. The difference between the two isn't that extreme.

  • sanderjd 2 days ago

    Nah, there's a third kind, which is probably the quieter majority. These are people who find these to be useful but imperfect tools, whose expectations remain realistic, and for whom the continuing progress these tools are making is pleasant and not exactly surprising but also not taken as a given.

    I'm finding that these tools are supporting an ever expanding portion of my work, while remaining frustrating frequently.

  • binary132 2 days ago

    I think a big part of the problem is the devx / workflows that these tools tend to encourage or impose. Watching my coworkers fumble with terrible copilot autoshit in a pairing session makes me want to scream on a daily basis. That doesn’t necessarily mean such a DX is the only way it could or should work, or that it’s even a good DX at all. Most of the time, it actually imposes more cost, both in lost momentum and in corrective action, than it relieves.

    I think something like a graph where nodes are specified clearly with inputs and outputs (or pre- and post-conditions, if you prefer), and agents iteratively generate and refine the implementation details could have some utility, but that would require programmers to know how to think about program design (and have a usable UI/UX for such a tool) instead of hastily stumbling through patches and modifications to existing code they already don’t understand.

    The new LLM systems are carrying us farther from that world, not closer to it.

  • conductr 2 days ago

    My guess is it depends on what you work with. I’ve been impressed by my ability to spin up nice looking websites and landing pages. Yet, I’ve tried to build a game and despite it appearing to work, it eventually has gotten confused about its own codebase and I end up with a ball of spaghetti where requirements of yesterday get lost or broken when adding functionality today. It feels more like I’m chasing my tail, vibe coding as they say, it’s just seems endlessly circular.

    My game dev is going to be a hybrid approach going forward. I’ll ask it to create a new project with the desired functionality and see how the solution was proposed and then incorporate that into my main code branch.

    I find this easier than the Cursor roll back or being overly cautious with suggested edits. I’m just getting to this realization so tbd if it works well. FYI, I’m good at programming generally but game dev is my Achilles. We’ll see how it it goes.

  • exe34 2 days ago

    I've managed to use ChatGPT to write a small app in Kotlin, and later used Cursor to fix some issues around performance. Personally I'm not a finish-it kind of guy unless I'm getting paid for it.

    To be fair, I'm not sure the quality of the code is much worse than all the boatloads of crapware in all the app stores out there.

    I suspect that the reason "revolutionary products" and "blazing fast iteration" isn't blowing our minds is because the problem was never the code - it was always the people around it, who know better than their customers. The best way to make good software is to test it on real people and iterate - that's going to take a long time no matter how fast the coding part is.

  • traceroute66 2 days ago

    > And those who, like you, find it to be an extremely finicky process that requires extreme amount of coddling to get average results at best.

    I am very much in that category.

    I would describe my experience as the old adage "why keep a dog and bark yourself".

    And no, I don't buy into people who say you have to engage in "prompt engineering". That's bullshit. Let's face it "prompt engineering" is a synonym for "I have to waste time coming up with hacky workarounds to make this dumb LLM come up with something usable".

    In the end, its always quicker and easier to do it myself. And unless you are fresh out of school, if you've got more than 10 minutes of coding experience, you will always do a better job than the LLM.

    Every time a new LLM model comes out, I fall into the same old trap "oh well, this one must be better, surely" I say to myself.

    And yet its always the same "you can't polish a turd but you can roll it in glitter" experience.

    I'm sick and tired of LLMs hallucinating shit.

    I'm sick and tired of LLMs inventing functions that have been available in stdlib for years.

    I'm sick and tired of LLMs generating useless code with boilerplate subfunctions that just contain a single commented "implement it yourself" line.

    On top of that LLMs are simply terrible for the environment, guzzling up all that electricity, producing nothing but hot air and bullshit results in return.

    Basically I'm sick and tired of LLMs and all the LLM fetishists that surround them.

  • tom_m 2 days ago

    It's a good sounding board and productivity tool...but it's over hyped.

    I think it'll calm down a bit. People are just making huge claims right now which is music to the ears of many.

    I like the tools don't get me wrong, but they aren't going to make a huge difference day to day. The area they will make a huge difference is in helping people work out issues faster. You know those multi-hour or multi-day issues you run into where you go crazy reading and re-reading the code. The LLMs are good at giving you a fresh look at the code that you'd otherwise need to get more people to help with or go sleep on it for a day.

    This time savings here is probably the most significant time savings AI code editors or agents will ever provide.

  • max_on_hn 2 days ago

    (disclaimer: I have a vested interest in the space as the purveyor of an AI software development agent)

    The result you described is coming soon. CheepCode[0] agents already produce working code in a satisfying percentage of cases, and I am at most 3 months away from it producing end-to-end apps and complex changes that are at least human-quality. It would take way less if I got funded to work on it full time.

    Given that I'm this close as a solo founder with no employees, you can imagine what's cooking inside large companies.

    [0] My product, cloud-based headless coding agents that connect directly to Linear, accept tickets, and submit GitHub PRs

    • johnecheck 2 days ago

      "At most 3 months away"

      This is a strong claim. I'm not aware of any AI model or system that can consistently and correctly make complex changes in existing codebases. What makes you so confident that you can achieve this, alone, in the next 3 months?

      • t917910 2 days ago

        The circumference of a circle is easily calculated as piradius2, so calculating the same length of an ellipse would not be much more difficult. Hehe.

  • peacebeard 2 days ago

    I don’t fit in either category. My experience is that LLMs are good at writing code that is easy to write. This is not game-changing, but it is useful.

  • dogtierstatus 2 days ago

    This is a quality vs quantity issue. There were x number of extremely competent and ~10x moderately competent programmers.

    LLMs have only made this to 100x moderately competent programmers and may be 2xed extremely competent ones?

    • analog31 2 days ago

      Even 2x would be an improvement. It would mean that a project goes from being 3 years late, to only 1.5 years late.

  • infecto 2 days ago

    You’ll usually find three kinds of people talking about LLMs online:

    1. The ones writing Medium posts about being 100x more productive. 2. The skeptics, like yourself, who construct made-up examples to prove it’s impossible to get value, framing things in a way where failure is the only plausible outcome. 3. And then there’s the middle group, people quietly getting real, uneven, but worthwhile value.

    You’d be surprised. The vast majority of engineering isn’t that unique. Usefulness often depends on the age of the codebase and the dynamics of the team. I’m firmly in the middle camp: far from perfect, but I get real value from these tools as they stand today. It’s like having a junior engineer who can scaffold out the basics so I can spend my mental energy on the parts that actually matter.

    • soraminazuki 2 days ago

      GP didn't "construct made-up examples" anywhere in this thread as of the time of writing this post. Where did you hallucinate that from?

      Seriously, why is every comment hyping up AI generated code like this. No concrete evidence, just turtles all the way down. But when actual examples do show up, it's a complete clown show [1][2].

      [1]: https://news.ycombinator.com/item?id=44050152

      [2]: https://news.ycombinator.com/item?id=43907376

      • infecto a day ago

        He definitely crafted the narrative he wanted, you see that just as much from skeptics as from the hype crowd. I still stand by what I said: there are always three camps in any LLM discussion. No hallucinations here, I can see where you land. I’m here for a constructive argument, and I believe there’s a real middle ground where LLMs are creating tangible value, even for experienced engineers.

        Your links about GitHub agents may be spot-on, I haven’t used them myself, so no strong opinion there. Same with Codex; it’s early. But why not mention tools like Cursor or Windsurf? Or the open source alternatives. That’s where people are seeing actual gains today. Why go all-in on anti-hype without talking about your direct experience?

  • kordlessagain 2 days ago

    “If we do not invent the future, someone else will do it for us - and it will be their future, not ours.” – Duncan D. Bruce & Dr. Geoff Crook, The Dream Café

  • emp17344 2 days ago

    The fact that no one can agree on how useful these things are makes me think they aren’t that useful.

  • wooque 2 days ago

    Because code is small part of making successful business and usually not the bottleneck.

  • gofreddygo a day ago

    There's a third kind that agrees these are stupid but strongly believes, beyond doubt, they'll be 100x better in 2 years.

    Everything AI generated today is hot garbage but hey, just wait 2 years and the road to eldorado will show itself.

    pfft.

  • bobxmax 2 days ago

    They are.

    People are building sophisticated products in days not months now. If you can't see that, it's your issue.

    • gyomu 2 days ago

      Would you be so kind as to link to some of these sophisticated products? Would love to check them out.

      • bobxmax 2 days ago

        Go look at the latest YC batch. 30% of them were built with 90% of the code being written by LLMs.

        • Jensson 2 days ago

          Where did you see that stat? And why just 30%, shouldn't almost all of them be built by LLM if LLM is that much better?

          Edit: This says a quarter, where did you get the 30% figure from?

          https://techcrunch.com/2025/03/06/a-quarter-of-startups-in-y...

          • viridian 7 hours ago

            Perhaps bobxmax should have asked an LLM what the exact number was instead of relying on his outdated & error prone human memory.

          • bobxmax a day ago

            wow 25% not 30% ya sure got me

            > shouldn't almost all of them be built by LLM if LLM is that much better

            a) not every yc company is a software company

            b) these things take time, and not everyone knows how to use these tools

            this stuff should be pretty self evident.

          • bookofjoe 2 days ago

            Are you seriously pointing out the difference between 25% and 30% as meaningful?

    • codr7 2 days ago

      Good luck with what, truth will find you and keep you accountable.

TimPC 2 days ago

LLMs give you superpowers if you work on a wide range of tasks and aren't a deep expert on all of them. They aren't that great if you know the area you are working in inside and out and never stray from it.

For example, using an LLM to help you write a Dockerfile when you write Dockerfiles once a project and don't have a dedicated expert like a Deployment Engineer in your company is fantastic.

Or using an LLM to get answers faster than google for syntax errors and other minor issues is nice.

Even using LLM with careful prompting to discuss architecture tradeoffs and get its analysis (but make the final decision yourself) can be helpful.

Generally, you want to be very careful about how you constrain the LLM through prompts to ensure you keep it on a very narrow path so that it doesn't do something stupid (as LLMs are prone to do), you also often have to iterate because LLMs will occasionally do things like hallucinate APIs that don't actually exist. But even with iteration it can often make you faster.

  • igor47 2 days ago

    > For example, using an LLM to help you write a Dockerfile when you write Dockerfiles once a project and don't have a dedicated expert like a Deployment Engineer in your company is fantastic.

    This made me think of gell-mann amnesia: https://en.m.wikipedia.org/wiki/Gell-Mann_amnesia_effect

    Basically, when you understand the problem domain, you know the llm is generating crap that, as the OP says, you're unwilling to put your name on. But when it's in a domain you don't know, you forget about how crappy the LLM is normally and go with the vibe.

    • ChrisClark 2 days ago

      As long as it's a simple enough problem and it works, I'm happy. The other day I needed to do something in powershell, which I probably use twice a year. But an LLM was just able to spit out what I needed for me.

      I find it's the best for that but once things get complicated, you're just arguing with the LLM, when you should have done it yourself.

      • zrobotics 2 days ago

        I'm pretty sure I've commented this before, but it's just too ridiculous not to share again. About a year ago I needed to convert ~25k png images to webp. I happened to have them on a Windows machine, and I don't know powershell at all. I figured this would be a good thing for AI to so I asked the free version of chatGPT to write me a script to do the conversion. I was expecting a one-liner script, since I know this would be a simple bash script and there are command line tools available for this conversion on windows.

        I must say, the solution was very creative. It involved using powershell to script autohotkey to open photoshop and use AHK to automate exporting each image as a webp, closing photoshop after exporting each image. I don't have a photoshop liscence, and I don't know why powershell would be needed to script another scripting tool. I also would suspect that photoshop or another Adobe tool probably has a bulk converter. But I do need to give points for originality, that is exactly the type of harebrained solution I would sarcastically suggest if I was wanting to mildly troll a coworker.

y42 2 days ago

Seconded! I'm using LLMs in many different ways—like you—starting with small troubleshooting tasks, quick shell scripts, coding, or simply asking questions.

I use a wide variety of tools. For more private or personal tasks, I mostly rely on Claude and OpenAI; sometimes I also use Google or Perplexity—whichever gives the best results. For business purposes, I either use Copilot within VSCode or, via an internal corporate platform, Claude, OpenAI, and Google. I’ve also experimented a bit with Copilot Studio.

I’ve been working like this for about a year and a half now, though I haven’t had access to every tool the entire time.

So far, I can say this:

Yes, LLMs have increased my productivity. I’m experimenting with different programming languages, which is quite fun. I’m gaining a better understanding of various topics, and that definitely makes some things easier.

But—regardless of the model or its version—I also find myself getting really, really frustrated. The more complex the task, the more I step outside of well-trodden paths, and the more it's not just about piecing together simple components… the more they all tend to fail. And if that’s not enough: in some cases, I’d even say it takes more time to fix the mess an LLM makes than it ever saved me in the first place.

Right now, my honest conclusion is this: LLMs are useful for small code completion tasks, troubleshooting and explaining —but that’s about it. They’re not taking our jobs anytime soon.

avalys 2 days ago

I find LLMs very useful when I need to learn something new, e.g. a new library or API, or a new language. It is much faster to just ask the LLM how to render text in OpenGL, or whatever, than read through tons of terrible documentation. Or, if I have a bunch of repetitive boilerplate stuff that I have no good templates for or cannot borrow from another part of the codebase. But when it comes to what I consider the actual 'work', the part of my codebase that is actually unique to the product I'm building, I rarely find them valuable in the "letting them write the code for me" sense.

I reflected once that very little of my time as a senior engineer is actually spent just banging out code. The actual writing of the code is never the hard part or time-consuming part for me - it's figuring out the right architecture, figuring out how to properly refactor someone else's hairball, finding performance issues, debugging rare bugs, etc. Yes, LLMs accelerate the process of writing the boilerplate, but unless you're building brand new products from scratch every 2nd week, how much boilerplate are you really writing? If the answer is "a lot", you might consider how to solve that problem without relying on LLMs!

  • melvinroest 2 days ago

    > I find LLMs very useful when I need to learn something new, e.g. a new library or API, or a new language. It is much faster to just ask the LLM how to render text in OpenGL, or whatever, than read through tons of terrible documentation.

    LLMs are better at reading terrible documentation than the average programmer. So in that sense, "obscure text reading and explain it better" there seems to be a clear value add.

    > If the answer is "a lot", you might consider how to solve that problem without relying on LLMs!

    Aren't there languages with a lot of boilerplate though?

    • nottorp 2 days ago

      > LLMs are better at reading terrible documentation than the average programmer.

      LLMs don't go and read the terrible documentation for you when prompted. They reproduce the information posted by other people that struggled with said terrible documentation, if it was posted somewhere.

      It's still better than a modern web search or struggling with the terrible documentation on your own - for introductory stuff.

      For going into production you have to review the output, and reading code has always been harder than writing it...

      • squidbeak 2 days ago

        This is is wildly incorrect. Documentation categorically isn't excluded from LLMs' training sets, and they are very well able to summarize that documentation when asked.

        • nottorp 2 days ago

          I believe you missed "when prompted".

  • belZaah 2 days ago

    There’s no silver bullet. Conceptualization is the hard part, exactly as you say. Mythical man-month is an important text and should be studied more.

    • namaria 2 days ago

      And also the seminal papers by Gödel, Turing, and the books on cybernetics (robotics and automation in 1950s) by Ashby and Wiener.

      The limits of what can be automated have been clear for almost a century now. Yet people insist in selling ways to automate your way out of social, political, economical problems. Come to think about it, people should also read Comways's paper as well.

elzbardico 2 days ago

There's no silver bullet.

It is amazing how in our field we repeatedly forget this simple advice from Fred Brooks.

In my experience, LLMs are way more useful for coding and less problem-prone when you use them without exaggerated expectations and understand that it was trained on buggy code, and that of course it is going to generate buggy code. Because almost all code is buggy.

Don't delegate design for it, use functional decomposition, do your homework and then use LLMs to eliminate toil, to deal with the boring stuff, to guide you on unfamiliar territory. But LLMs don't eliminate the need for you to understand the code that goes with your name. And usually, if you think a piece of LLM generated code is perfect, remember that maybe the defects are there, but you need to improve your own knowledge and skills to find it. Be always suspicious, don't trust it blindly.

  • infecto a day ago

    Wish this was higher up. I fully agree and it’s the same camp for me. I am not sure why but the vast majority either treat it as a magic bullet or says it’s pointless and nothing but flaws.

    It’s not perfect it might write buggy code but I write buggy code too and would guess most engineers do. It takes a lot of the menial tasks out of the way. Sure I might spruce up some of the code it writes but why is that a deal breaker?

  • esafak 2 days ago

    I agree with what you said except the delegation of design, if by which you mean architecture, because I find that it is good at that too. When it comes to design I ask it to describe it, and iterate at a high level before proceeding to implementation. Just as one would in real life.

    • infecto a day ago

      I tend to agree with you with the caveat that I believe it’s important to full understand what’s being built. Not suggesting you think either way.

gwd 2 days ago

Basically yes, once the "problem" gets too big, the LLM stops being useful.

As you say, it's great for automating away boring things; as a more complicated search & replace, for instance. Or, "Implement methods so that it satisfies this interface", where the methods are pretty obvious. Or even "Fill out stub CRUD operations for this set of resources in the API".

I've recently started asking Claude Opus 4 to review my patches when I'm done, and it's occasionally caught errors, and sometimes has been good at prompting me to do something I know I really should be doing.

But once you get past a certain complexity level -- which isn't really that far - it just stops being useful.

For one thing, the changes which need to be made often span multiple files, each of which is fairly large; so I try to think carefully about which files would need to be touched to make a change; after which point I find I have an idea what needs to be changed anyway.

That said, using the AI like a "rubber duck" programmer isn't necessarily bad. Basically, I ask it to make a change; if it makes it and it's good, great! If it's a bit random, I just take over and do it myself. I've only wasted the time of reviewing the LLM's very first change, as nearly everything else I'd've had to do if I wrote the patch myself from scratch anyway.

Furthermore, I often find it much easier to take a framework that's mostly in the right direction and modify it the way that I want, than to code up everything from scratch. So if I say, "Implement this", and then end up modifying nearly everything, it still seems like less effort than starting from scratch myself.

The key thing is that I don't work hard at trying to make the LLM do something it's clearly having trouble with. Sometimes the specification was unclear and it made a reasonable assumption; but if I tell it to do something and it's still having trouble, I just finish the task myself.

morsecodist 2 days ago

I have had a pretty similar experience to you. I have found some value here:

- I find it is pretty good at making fairly self-contained react components or even pages especially if you are using a popular UI library

- It is pretty reliable at making well-defined pure functions and I find it easier to validate that these are correct

- It can be good for boilerplate in popular frameworks

I sometimes feel like I am losing my mind because people report these super powerful end to end experiences and I have yet to see anything close in my day to day usage despite really trying. I find it completely falls over on a complete feature. I tried using aider and people seem to love it but it was just a disaster for me. I wanted to implement a fairly simple templated email feature in a Next.js app. The kind of thing that would take me about a day. This is one of the most typical development scenarios I can imagine. I described the feature in it's entirety and aider completely failed, not even close. So I started describing sub-features one by one and it seemed to work better. But as I added more and more, existing parts began to break, I explained the issues to aider and it just got worse and worse with every prompt. I tried to fix it manually but the code was a mess.

kragen 2 days ago

> I asked friends who are enthusiastic vibe coders and they basically said "your standards are too high".

Sure, vibe coders by definition can't have any standards for the code they're generating because by definition they never look at it.

> Is the model for success here that you just say "I don't care about code quality because I don't have to maintain it because I will use LLMs for that too?"

Vibe coding may work for some purposes, but if it were currently a successful strategy in all cases, or even narrowly for improving AI, Google AI or DeepSeek or somebody would be improving their product far faster than mere humans could, by virtue of having more budget for GPUs and TPUs than you do, and more advanced AI models, too. If and when this happens you should not expect to find out by your job getting easier; rather, you'll be watching the news and extremely unexpected things will be happening. You won't find out that they were caused by AI until later, if ever.

GBintz 4 hours ago

I don't understand the hesitation with existing large codebases. Maybe everything I work on is just easy compared to yall.

It has been unimaginably helpful in getting me up to speed in large existing codebases.

First thing to do in a codebase is to tell it to "analyze the entire codebase and generate md docs in a /llmdocs directory". Do this manually in a loop a few times and try a few different models. They'll build on each other's output.

Chunk embed and index those rather than the code files themselves. Use those for context. Get full code files through tool calls when needed

lordkrandel 2 days ago

LLM is good for throwaway code. Easier to write, harder to maintain, diagnose, repair, also with LLMs. Which is most code that is not a product.

Fast food, assembly line, factory may be examples, but there is a HUGE catch: When a machine with a good setup makes your burger, car or wristwatch, you can be sure that at 99.99% it is as specified. You trust the machine.

With LLMs, you have to verify each single step, and if you don't, it simply doesn't work. You cannot trust them to work autonomously 24/7.

That's why you ain't losing your job, yet.

ZaoLahma 2 days ago

I would never use an LLM for trying to generate a full solution to a complex problem. That's not what they're good at.

What LLMs are good at and their main value I'd argue, is nudging you along and removing the need to implement things that "just take time".

Like some days back I needed to construct a string with some information for a log entry, and the LLM that we have suggested a solution that was both elegant and provided a nicer formatted string than what I had in mind. Instead of spending 10-15 minutes on it, I spent 30 seconds and got something that was nicer than what I would have done.

It's these little things that add up and create value, in my opinion.

  • rollcat 2 days ago

    > What LLMs are good at and their main value I'd argue, is nudging you along and removing the need to implement things that "just take time".

    I was still learning Java at uni (while being a Python/Lisp fanboy) when I realised this:

    - Complex and wordy languages need tooling (like autocomplete, autoformatting) to handle the tedious parts.

    - Simple and expressive languages can get away with coding in notepad.exe.

    - Lisp, as simple and powerful as it is, needs brace highlighting. You. Simply. Just. Can't.

    Now 10, 20 years later you can look back at the evolution of many of these languages; some trends I've observed:

    - Java, C#, C++, have all borrowed a lot from functional languages.

    - JVM has Clojure.

    - Go stubbornly insists on "if err != nil" - which isn't even the worst part; the following fmt.Errorf is.

    - Rust (post-1.0) cautiously moved towards "?"; Zig (still pre-1.0) also has specific syntax for errors.

    - Python is slowly getting more verbose, mostly because of type annotations.

    - Autoformatters are actually great, you don't even have to care about indenting code as you spit it out, but... Python, being whitespace-sensitive, makes them a bit less useful.

    Good tooling helps you with wordy languages. Expressive languages help you write concise code. Code is read much more often than it's written. Deterministic tooling can work with the structure of the code, but LMs (being probabilistic) can help you work with its intent. Language models are an evolution of automated tooling - they will get better, just like autocomplete got better; but they will never "solve" coding.

    In my opinion, there's no truth here, only opinions.

terrut 2 days ago

I am one of those lazy IT guys that is very content working in support and ops. I understand a lot of programming concepts and do a bit of automation scripting but never really bothered to learn any proper language fully. Vibe coding was made for people like me. I just need something that works, but I can still ask for code that can be maintained and expanded if needed.

It finally clicked for me when I tried Gemini and ChatGPT side by side. I found that my style of working is more iterative than starting with a fully formed plan. Gemini did well on oneshots, but my lack of experience made the output messy. This made it clear to me that the more chatty ChatGPT was working for me since it seems to incorporate new stuff better. Great for those "Oh, crap I didn't think of that" moments that come up for inexperienced devs like me.

With ChatGPT I use a modular approach. I first plan a high level concept with 03, then we consider best practices for each. After that I get best results with 4o and Canvas since that model doesn't seem to overthink and change direction as much. Granted, my creations are not pushing up against the limits of human knowledge, but I consistently get clean maintainable results this way.

Recently I made a browser extension to show me local times when I hover over text on a website that shows an international time. It uses regex to find the text, and I would never have been able to crank this out myself without spending considerable time learning it.

This weekend I made a Linux app to help rice a spare monitor so it shows scrolling cheat sheets to help me memorize stuff. This turned out so well, that I might put it up on GitHub.

For dilettantes like me this opens up a whole new world of fun and possibilities.

0x000xca0xfe 2 days ago

Nope, at least Claude code is very useful IME.

Great for annoying ad-hoc programming where the objective is clear but I lack the time or motivation to do it.

Example: After benchmarking an application on various combinations of OS/arch platforms, I wanted to turn the barely structured notes into nice graphs. Claude Code easily generated Python code that used a cursed regex parser to extract the raw data and turned it into a bunch of grouped bar charts via matplotlib. Took just a couple minutes and it didn't make a single mistake. Fantastic time saver!

This is just an ad-hoc script. No need to extend or maintain it for eternity. It has served its purpose and if the input data will change, I can just throw it away and generate a new script. But if Claude hadn't done it, the graphs simply wouldn't exist.

Update: Sorry, missed "writing self-contained throwaway pieces of code"... well for core development I too haven't really used it.

Phelinofist 2 days ago

The company I work for rolled out GitHub Copilot. It is quite underwhelming honestly. We have a lot of homegrown frameworks that it does not know about (I mean, to be fair, it actually cannot know about these). When I asked it to explain some piece of code it just repeated what was stated in the JavaDoc, nearly verbatim. I had a class that had obvious null issues and I asked it to fix it and it introduced quite a lot of unnecessary "// we have to check it is not null "-style comments.

reify 2 days ago

An IBM study based on conversations with 2,000 global CEOs recently found that only 25% of AI initiatives have delivered their expected ROI over the last few years, and, worse still, "64% of CEOs surveyed acknowledge that the risk of falling behind drives investment in some technologies before they have a clear understanding of the value they bring to the organization." 50% of respondents also found that "the pace of recent investments has left their organization with disconnected, piecemeal technology," almost as if they don't know what they're doing and are just putting AI in stuff for no reason.

https://newsroom.ibm.com/2025-05-06-ibm-study-ceos-double-do...

  • esafak 2 days ago

    CEOs are supposed to be able to make snap decisions without having the big picture. If everybody else is doing it, following suit will mean you won't be left behind. Worst case, you'll all make the same mistake. Of course, if you know better than the others, you can profit from that.

pabna 3 hours ago

I think they're amazing but still a new tool only 3 years old and we have like 20 years left till Super-AGI

cube2222 2 days ago

I get a lot of value out of LLMs including for existing codebases and authoring / modifying code in them.

However, only maybe 10% of that is agentic coding. Thus, my recommendation would be - try non-agentic tools.

My primary workflow is something that works with the Zed editor, and which I later ported as a custom plugin to Goland. Basically, you first chat with the AI in a sidebar possibly embedding a couple of files in the discussion (so far nothing new), and then (this is the new part) you use contextual inline edits to rewrite code "surgically".

Importantly, the inline edits have to be contextual, they need to know both the content of the edited file, and of the conversation so far, so they will usually just have a prompt like "implement what we discussed". From all I know, only Zed's AI assistant supports this.

With this I've had a lot of success. I still effectively make all architectural decisions, it just handles the nitty-gritty details, and with enough context in the chat from the current codebase (in my case usually tens of thousands of tokens worth of embedded files) it will also adhere very well to your code-style.

  • maxnevermind 2 days ago

    > From all I know, only Zed's AI assistant supports this.

    You mean the session context awareness? I thought it is a default in all major IDE/plugins. Or you mean some specific trait of that feature?

    • cube2222 a day ago

      For inline edits, yeah.

      The easiest way to check is to put some “secret passphrase” in the chat, and then try using inline edits to “add the passphrase as a comment”.

nico 2 days ago

It really depends on the task and your preferences

I’ve had great success using LLMs for things that I haven’t done in a while or never before. They allow me to build without getting too bogged down into the details of syntax

Yes, they require constant attention, they are not fully independent or magical. And if you are building a project for the longer run, LLM-driven coding slows down a lot once the code base grows beyond just a couple of basic files (or when your files start getting to about 500-800+ lines)

I’ve tried several agentic editors and tools, including cursor, they can def be helpful, but I’d rather just manually loop between ChatGPT (o4-high-mini for the most part) and the editor. I get a very quick and tight feedback loop in which I get plenty of control

Git is essential for tracking changes, and tests are gold once you are at a certain size

csomar 2 days ago

I was just struggling with getting a standard/template npm package up and running with a few customizations. Ended up just following one of the popular npm packages. This is Claude 4 and although it is good at writing code, I feel like it gets dumber at certain tasks especially when you want it to connect things together. It very easily messes one thing up and then when you include the errors, it spirals from there into madness.

> Am I just not using the tools correctly?

No, there is no secret sauce and no secret prompting. If LLMs were capable, we'll see lots of new software generated by it given how fast LLMs are at writing code. Theoretically, assuming a conservative 10token/s speed and a 100M token for Chromium code base, you could write a new browser with LLMs in only 115 days.

spiderxxxx a day ago

I've been programming in python for over 20 years. An LLM creates code that sometimes works, but it definitely doesn't meet my standards, and there's no way I'd use the code since I couldn't support it. People who have less experience in Python might take that working code and just support that with their LLM, still having no clue what it does or why it works. That's probably fine for MVP but it won't stand up in the real world where you have to support the code or refactor it for your environment.

I tried to use an LLM to write a simple curses app - something where there's a lot of code out there, but most of the code is bad, and of course it doesn't work and there's lots of quirks. I then asked it to see if there are libraries out there that are better than curses, it gave me 'textual' which at first seemed like an HTML library, but is actually a replacement for curses. It did work, and I had some working code at the end, but I had to work around platform inconsistencies and deal with the LLM including outdated info like inline styles that are unsupported in the current version of the library. That said, I don't quite understand the code that it produced, I know it works and it looks nice, but I need to write the code myself if I want a deeper understanding of the library, so that I can support it. You won't get that from asking an LLM to write your code for you, but from you using what you learn. It's like any language learning. You could use google translate to translate what you want, and it may seem correct at first glance, but ultimately won't convey what you want, with all the nuance you want, if you just learned the language yourself.

idw 2 days ago

Simon Willison has helpful recent advice on this: Here’s how I use LLMs to help me write code (11th March 2025) https://simonwillison.net/2025/Mar/11/us.ing-llms-for-code/

dlevine 2 days ago

I think of LLMs as knowing a lot of things but as being relatively shallow in their knowledge.

I find them to be super useful for things that I don't already know how to do, e.g. a framework or library that I'm not familiar with. It can then give me approximate code that I will probably need to modify a fair bit, but that I can use as the basis for my work. Having an LLM code a preliminary solution is often more efficient than jumping to reading the docs immediately. I do usually need to read the docs, but by the time I look at them, I already know what I need to look up and have a feasible approach in my head.

If I know exactly how I would build something, an LLM isn't as useful, although I will admit that sometimes an LLM will come up with a clever algorithm that I wouldn't have thought up on my own.

I think that, for everyone who has been an engineer for some time, we already have a way that we write code, and LLMs are a departure. I find that I need to force myself to try them for a variety of different tasks. Over time, I understand them better and become better at integrating them into my workflows.

the__alchemist a day ago

I speculate, from this, that you are giving the LLMs too big of a task. Not quite vibe coding, but along that path.

> "You've refactored most of the file but forgot a single function"). It would take many many iterations on trivial issues, and because these iterations are slow that just meant I had to context switch a lot, which is also exhausting.

Try prompts like this:

"Given these data structures: (Code of structs and enums), please implement X algorithm in this function signature. (Post exact function signature)."

Or: "This code is repetitive. Please write a macro to simplify the syntax. Here's what calling it should look like (Show macro use syntax)"

Or: "I get X error on this function call. Please correct it."

Or: "I'm copying these tables into native data structures/match arms etc. Here's the full table. Here's the first few lines of the native structures: ...""

ants_everywhere 2 days ago

I'm getting real valuable work done with aider and Gemini. But it's not fun and it's not flow-state kind of work.

Aider, in my humble opinion, has some issues with its loop. It sometimes works much better just to head over to AI studio and copy and paste. Sometimes it feels like aider tries to get things done as cheaply as possible, and the AI ends up making the same mistakes over again instead of asking for more information or more context.

But it is a tool and I view it as my job to get used to the limitations and strengths of the tool. So I see my role as adapting to a useful but quirky coworker so I can focus my energy where I'm most useful.

It may help that I'm a parent of intelligent and curious little kids. So I'm used to working with smart people who aren't very experienced and I'm patient about the long term payoff of working at their level.

tootyskooty 2 days ago

I still get a lot more value from webuis than from the agentic offerings, also for more general coding tasks. The main benefit is that I can more easily manage context. A muddied or insufficient context can sometimes degrade performance by a whole model generation, ime.

What works for me is collecting it manually and going one implementation chunk at a time. If it fails, I either do it myself or break it down into smaller chunks. As models got better these chunks got larger and larger.

Collecting context manually forces me to really consider what information is necessary to solve the problem, and it's much easier to then jump in to fix issues or break it down compared to sending it off blind. It also makes it a lot faster, since I shortcircuit the context collection step and it's easier to course-correct it.

Collecting manually is about 10 seconds of work as I have an extension that copies all files I have opened to the clipboard.

estebarb 2 days ago

In my experience it works better in more constrained and repetitive domains. For example: it is better doing a Ruby on Rails website than a ASP.NET web service.

Particularly, with data structures it is garbage: it nevers understands the constrains that justify writing a new one instead of relying on the ones from the standard library.

And finally, it is incapable of understanding changes of mind. It will go back to stufd already discarted or replaced.

The worst part of all is that it insists in introducing its own "contributions". For example, recently I have been doing some work on ML and I wanted to see the effect of some ablations. It destroyed my code to add again all the stuff I had removed on purpose.

Overall, it provides small typing/search savings, but it cannot be trusted at all yet.

blueboo 2 days ago

They can do chores that clear your day to do more challenging work.

That’s immensely valuable and pretty game changing

Maro 2 days ago

I'm a manager at work, so I only code during the weekends on personal projects these days. In that context, I get a lot of value out of plain-vanilla ChatGPT, it's able to write high-quality code for my toy use-cases in the 100-500 LOC range.

tomyedwab 2 days ago

The comments on this thread are a perfect mixture of Group A, explaining how there is no value in AI tools, and if there was, where is the evidence? And Group B, who are getting value from the tools and have evidence of using them to deliver real software, but are being blasted by Group A as idiots who can't recognize bad code. Why so angry?

I've been writing code for 36 years, so I don't take any of the criticism to heart. If you know what you are doing, you can ship production quality code written by an LLM. I'm not going to label it "made by an AI!" because the consumer doesn't care so long as it works and who needs the "never AI!" backlash anyway?

But to the OP: your standards are too high. AI is like working with a bright intern, they are not going to do everything exactly the way that you prefer, but they are enthusiastic and can take direction. Choose your battles and focus on making the code maintainable in the long term, not perfect in the short term.

fidotron 2 days ago

I have seen two apparently contradictory things.

Firstly, there absolutely are people popping up in certain domains with LLM assisted developed products that could not have managed it otherwise, with results you would not suspect were made that way if you were not told.

However, I share the same problem myself. The root of it is "analysis is harder than synthesis". i.e. if you have to be sure of the correctness of the code it's far easier to write it yourself than establish that an LLM got it right. This probably means needing to change how to split things out to LLMs in ways human co-workers would find intolerable.

noahbp 2 days ago

> I've tried:

> - Cursor (can't remember which model, the default)

> - Google's Jules

> - OpenAI Codex with o4

Cursor's "default model" rarely works for me. You have to choose one of the models yourself. Sonnet 4, Gemini 2.5 Pro, and for tricky problems, o3.

There is no public release of o4; you used o4-mini, a model with poorer performance than any of the frontier models (Sonnet 4, Gemini Pro 2.5, o3).

Jules and Codex, if they're like Claude Code, do not work well with "Build me a Facebook clone"-type instructions. You have to break everything down and make your own tech stack decisions, even if you use these tools to do so. Yes they are not perfect and make regressions or forget to run linters or check their work with the compiler, but they do work extremely well if you learn to use them, just like any other tool. They are not yet magic that works without you having to put in any effort to learn them.

tshadley 2 days ago

100% agreed with your experience, AI provides little value to one's area of expertise (10+ years or more). It's the context length -- AI needs comparable training or inference-time cycles.

But just wait for the next doubling of long task capacity (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...). Or the doubling after that. AI will get there.

ishwarjha a day ago

A few days ago I discovered a few security issues in one of the software we have developed.

I instantly decided to review the frontend and backend code with AI (used cursor and GitHub copilot)

It reported a dozen more issues which otherwise would have taken a few weeks to find.

We asked AI to generate code that will help the security providing rules informing about technology stack, coding guidelines, project structure and product description.

We got good recommendations, but couldn't implement the suggestions straightforward.

However, we took the advices and hand-coded the suggestions at all code files.

The entire exercise took a week for fairly large project.

As per my tech lead, it would have taken minimum 2 months.

Soniy works.

therealmarv 2 days ago

There are two key points which are important to get most out of LLM coding assistants:

1. Use a high quality model with big context windows via API (I recommend Openrouter). E.g. Google Gemini 2.5 Pro is one of the best which keeps constant good quality (OpenAI reasoning models can be better in problem solving but it's kinda a mixed bag). Other people swear by the Claude Sonnet models.

2. Upgrade your code tools you combine with this high quality models. Google Jules and OpenAI Codex are so brand new and have a totally different aim than Cursor. Don't use them (yet). Maybe they will get good enough in future. I would focus on established tools like aider (steepest learning curve), roo code (easier) to be paired with Openrouter and if you want to have it really easy claude code (only useful with a 100-200 USD Anthropic subscription IMHO). On average you will get better results with Aider, roo, claude code than with Cursor or Windsurf.

Btw. I think Cursor and Windsurf are great as a starter because you buy a subscription with 15-20 USD and are set. It can be most likely that the more high quality tools burn more tokens and you spent more per month but you also get better quality back in return.

Last but not least and can be applied to every coding assistant: Improve your coding prompts (be more specific in regards to files or sources), do smaller and more iterations until reaching your final result.

  • prmph 2 days ago

    It's not about the tools. The complaint here is a general one that shows up no matter what tools you are using

    • infecto 2 days ago

      The tools do matter. The original complaint is dribble. The poster does not even know what model they used in cursor which is surprising, it’s the most important part of the process. They also compared two entirely brand new tools that are not adjacent to cursor.

      • prmph 2 days ago

        Ok, that's the "you're holding it wrong" thing.

        When I start to stray into even moderately complex work, LLMs become pretty useless quick. Tell me your setup, and I will give a quick sample task that it will fail at. Stop the fanboyism please

        • morsecodist 2 days ago

          Thank you. It is so frustrating that you hear that LLMs are a PhD level intelligence capable of any task you can throw at it and when it can't solve your problem you hear: "well you are using a1.36 and b1.37-high is really the one this time" (despite the fact that you have been hearing these claims since before that model came out) or "you are prompting it wrong, have you tried describing all of your app features and your entire approach to coding in a text file then using that to get the AI to make a list of prompts then refining those prompts into different text files and put those back into the AI..."

          • infecto a day ago

            Totally fair frustration. Unfortunately model/version does matter—it’s not pedantry, it’s debugging. And no, you shouldn’t need a prompt engineering PhD to get value, but some structure and awareness of tool limits go a long way.

            • morsecodist a day ago

              It's not that I think model version doesn't matter. I switch between them all the time (often to downgrade as much as upgrade honestly). It's that I think people are misrepresenting the kinds of results you can get from these models and seem to take it as a personal attack and come up with excuses when you talk about limitations that you've encountered. It makes it difficult to engage in conversations about tools and I've gotten to the point where I don't believe anything anyone says about it anymore and I just try tools for myself.

              I said people are saying the models are PhD level intelligent not that you need to be. I get a ton of value from them and I don't have a PhD.

              • infecto a day ago

                When the original post has no clue what model they are using it throws all credibility out the window. At that point it’s appropriate to point that out to them with suggestions. Nobody here was suggesting that LLMs are PhDs like you are saying. You are the only one bringing that up.

                • prmph a day ago

                  > When the original post has no clue what model ...

                  Well, that's the point. As long as they are using a recent-ish model it really doesn't matter. Not that there are no differences in performances between models, it's that there is no model today that even comes close not requiring extensive hand-holding to accomplish real-world software engineering of even slightly moderate complexity.

                  Case in point: I have been frustrated that most markdown viewers don't do automatic indentation of section levels. I thought, this is a perfect test of coding assistants: the problem and solution is straightforward conceptually, and I don't even care about the platform and architecture used to accomplish it.

                  I've asked all the major models to implement a simple markdown viewer that could do automatic indentation, and they all fall flat. Some even give me code that will not run; of the rest, none has provided code that basically does the thing I've asked for.

                • morsecodist a day ago

                  I am more referring to my experience in general not just in the thread. I see this PhD thing a lot in the media.

          • qwerasdf5 2 days ago

            Who told you that LLMs are a "PHD level intelligence"?

        • therealmarv 2 days ago

          I just hope you keep still trying with whatever tool you want. I see among many developers to see frustration once after some deep testing of AI coding tools and then they never look back and never try again. This means they will be stuck with soon outdated knowledge and experiences.

          The LLMs advance and so does the quality. What e.g. OpenAIs o3 can do is so much better than what GPT 3.5 can do.

          I'm convinced every developer who does not stay up to date with AI coding tools and know how to use them will become obsolete sooner or later.

        • infecto a day ago

          Not knowing which model you’re using is doing it wrong, unfortunately, that matters with current-gen tools. The differences are significant. And while LLMs do hit limits fast on deep complex work, dismissing them outright misses the real utility: they’re great at the tedious stuff. No fanboyism but more middle of the road it works great for some things, ok for others and terrible for the rest.

      • olddustytrail 2 days ago

        > The original complaint is dribble.

        The usual word is "drivel" rather than "dribble".

        • infecto a day ago

          Thanks was not paying attention with lack of sleep!

    • therealmarv 2 days ago

      And I tell from personal experience: The tools matter (!) for more complex problems.

      Op said literally "Am I just not using the tools correctly?" and I said for the 2 of 3 tools OP tried are so new and experimental I would put them away (for now).

      • prmph 2 days ago

        You're still perpetuating this myth that it is about the tools, when every interaction I've had with coding assistants and LLMs veers into the territory OP describes. To be fair, the work I am doing is pretty complex and novel.

        And I use a wide range of model/versions.I mostly use Claude, from 3.5 to the newest 4.0. I also use Gemini and with Copilot.

        • therealmarv 2 days ago

          It's not a myth. I don't promise that better tools solve every problem but it can increase quality and usefulness of the output. One of the main big problems of (nearly) all AI coding tools is to get enough context of understanding which means the right files for a prompt need to be send along. Especially the bigger a project gets the more important that is. It's kinda an AI in itself just to gather the right files for the main AI prompt which is sent to the main solving AI. E.g. in aider I specify manually which files will be sent along the prompt and they are not truncated or ignored, something which can happen easily in tools like Copilot, Cursor, even in roo code behind the scenes. This leads to the result that the main AI does not see the whole picture of the project and makes wrong guesses and maybe assumptions. More details in prompts in every tool help find the AI to understand the bigger picture and the important files. It maybe even helps to describe and save the whole project and structure in a text file to describe it to the AI better along every prompt.

          Roo code is smarter with this problem, Claude Code is also pretty smart with it and in aider the problem does not exist if you as a human understand the project structure and tell aider the important context files to send to the AI.

        • kordlessagain 2 days ago

          You are perpetuating that it isn't about the tools, but it definitely is about the tools it has to do things.

AfterHIA 2 days ago

I like to think of the painter who's trained in classical design; they spend their time thinking about Baroque diagonals and golden ratios. They have a, "deep knowledge" of how paintings are classically organized. One day they encounter a strange machine that can produce any image they wish-- without considering the economic impact this would have on their craft he is often asked, "isn't this machine a marvel?" His reply is often, "it sheds no light for me on why certain ratios produce beautiful forms, it tells me nothing of the inspired mind of poets, nor of the nature of the Good."

In Plato's Republic Socrates' compares the ability to produce a piece of furniture with the ability to produce the image of a cabient or so-forth with a small compact mirror; what is the difference if a deceivable crowd doesn't know the difference?

Simpletout 15 hours ago

If you ever did a declarative programing, then you know what's LLM brought in it. If before you should say in a very formal way what you want to do, now you can do the same writing your prompt if a very relaxed form. You declarations are fed from thousand implementation sources now, than just few ones before. Otherwise I do not see much difference. I can also warn also that using a very formal language is much more concise, and you get a result in much shorter time. However, most people try to avoid any learning.

discordance 2 days ago

Have you tried using a rules file for for those micromanage-y tasks you mentioned?

I have collected similar requests over time and I don't have to remind GH copilot/Claude as much anymore.

_boffin_ 2 days ago

I just had an llm build the my side business’s website and full client portal. About 15k loc.

It’s amazing. Better design in terms of UI / UX than I could have fathomed and so much more.

There’s a lot of duplicated code that I’ll clean up, but the site functions and will be launched for clients to start using soon.

For my day job, it’s also helping me build out the software at a faster pace than before and is an amazing rubber duck.

  • maxnevermind 2 days ago

    Have you read through all 15K loc, What is your feeling about maintainability of it?

    Also as I understand one the main problem with LLM right now is trying to apply some surgical changes to large enough code base or adding some extra functionality without breaking/altering existing ones, have you faces issues with that?

    Another issue is security, I've heard some horror stories of non-tech people developing web solutions to later find them destroyed by hackers because they didn't know were to look to find holes in their design.

    • _boffin_ 2 days ago

      I’m finishing up some hardening features and some additionally functionality right now. It’s “maintainable” par se, but I will be going through it and cleaning it up a lot. It’s a Django application. It likes to write out large single files. Already went through to break things up into similar views and functionality.

      And as for security., were they even caring about it? Were they asking good questions? Reading about things? Learning ways to secure things?

      If someone blindly follows what it gives you, something interesting will happen at some point. Use it as an enhancer.

decGetAc 2 days ago

Maybe I'm using these models incorrectly but I just don't ask it to do a lot at once and find it extremely useful.

Write tests for x in the style of this file to cover a, b, c.

Help me find a bug here within this pseudo code that covers three classes and a few functions. Here's the behavior I see, here's what I think could be happening.

I rarely give it access to all the code and usually give it small portions of code and ask for small things. I basically treat it as if I was reaching out to another senior developer in a large company or SO. They don't care to learn about all the details that don't matter, and want a good promoted question that's not wasting their time and that they can help with.

Using it this way I absolutely see the benefits and I'd say an arbitrary 1.25x sounds right (and I'm an experienced engineer in my field).

I'll just quietly keep using it this way and ignore the overwhelming hype on both sides (it's not a speed up camp and it's 100x camp. Imo both are wrong but the it's not a speed up camp make me question how they're using it the most

chewz 2 days ago

I use this analogy. In the early 90's I had been programming in assembler and sometimes in pure hex codes. I had been very good at that, creating really effective code, tight, using as little resources as possible.

But then resources became cheap and it stoped matter. Yeah, the tight well designed machine code is still some sort of art expression but for practical purpose it makes sense to write a program in higher level language and waste a few MB...

  • lordkrandel 2 days ago

    I don't agree. It may be true that most code is throwaway.

    But you trust a C compiler, or a Python intepreter, to do their job in a deterministic way. You will never be able to trust Copilot telling you that "this should be the code you are using".

    It may suggest you using AWS, or Google, or Microsoft, or Tencent infrastructure. An LLM can even push you a specific style, or political agenda, without even you realizing it.

    I hate polarized discussion all-or-nothing thinking about LLMs. See how perfectly and reliably they can translate text in whatever language. See them fail at aligning a table with a monospace font.

    • kragen 2 days ago

      I think you probably need a multimodal LLM to align a table with a monospace font. Blind human programmers using screenreaders will also have difficulty with that particular task. It doesn't mean they're bad at programming; it just means they can't see.

      • lordkrandel 2 days ago

        Human blind programmers may write programs to align their own tables like requested, in autonomy. Instead of me seeing them doing wrong, asking for a fix, and them doing it wrong again in an endless cycle. Also, when I tell them the solution and ask to add a feature - bam, the table is made wrong again.

  • noosphr 2 days ago

    A c program is understandable even if it isn't efficient. It's also deterministic, or close enough that it shouldn't matter.

    An llm project that can be generated from scratch every time is maybe understandable if you use very good prompts and a lot of grounding text. It is not deterministic unless you use zero temperature and stick with the same model forever. Something that's impossible now. Six months ago the state of the art model was deepseek r1.

  • bonki 2 days ago

    This is exactly why we can't have nice things :(

HenryBemis 2 days ago

I tend to spend 15mins to write clear requirements (functional and non-functional specs). And then ChatGTP works its miracle. When I ask it "do write code in X language that does so-and-so", it's proper crapware. But those 15mins of reqs, save me 4-5 hours of writing back and forth, looking at manuals, etc (I'm not a super dev, I'm not even a professional dev.) But I do ask it to write code for me as am I trying to solve (my) small IT problems, and from seeing various marketplaces, nobody has posted code/software to do what I want.

Perhaps one day I'll 'incorporate myself' and start posting my solutions and perhaps make some dough.. but the I benefit far more than the $20 a month I am paying.

The right 'prompt' (with plenty of specs and controls) saves me from the (classic!) swing-on-tree example: https://fersys.cloud/wp-content/uploads/2023/02/4.jpg

pornel 2 days ago

I find they're useful for the first 100 lines of a program (toy problems, boilerplate).

As the project becomes non-trivial (>1000 lines), they get increasingly likely to get confused. They can still seem helpful, but they may be confidently incorrect. This makes checking their outputs harder. Eventually silly bugs slip through, cost me more time than all of the time LLMs saved previously.

sawmurai 2 days ago

Today I tried to use Copilot to go through a code base and export unexported types, interfaces and unions (typescript). It had to also rename them to make sure they are unique in context of the package. Otherwise I would have used search&replace :)

It started out promising, renaming the symbols according to my instructions. Slower than if I had done it myself, but not horrible slow. It skipped over a few renames so I did them manually. I had to tell it to continue every 2 minutes so I could not really do anything else in the meantime.

I figured it’s quicker if I find the files in question (simple ripgrep search) and feed them to copilot. So I don’t have to wait for it to search all files.

Cool, now it started to rename random other things and ignored the naming scheme I taught it before. It took quite some time to manually fix its mess.

Maybe I should have just asked it to write a quick script to do the rename in an automated way instead :)

calrain 2 days ago

  It's easy to get good productivity out of LLMs in complex apps, here are my tips:

  Create a directory in the root of your project called /specs

  Chat with a LLM to drill into ideas having it play the role of a Startup Advisor, work through problem definitions, what your approach is, and build a high level plan.

  If you are happy with the state of your direction, ask the LLM to build a product-strategy.md file with the sole purpose of describing to an AI Agent what the goal of the project is.

  Discuss with an LLM all sorts of issues like:

    Components of the site

    Mono Repo vs targeted repos  

    Security issues and your approach to them

    High level map of technologies you will use

    Strong references to KISS rules and don't over complicate

    A key rule is do not build features for future use


  Wrap that up in a spec md file

  Continue this process until you have a detailed spec, with smaller .md files indexed from your main README.md spec file

  Continue promoting the role of AI Developer, AI Consumer, AI Advisor, End User

  Break all work into Phase 1 (MVP), Phase 2, and future phases, don't get more granular (only do Phase 2 if needed)

  Ask LLM to document best practice development standards and document in your CLAUDE.md or whatever you use. Discuss the standards, err to industry standard if you are lost

  Challenge the LLM while building standards, keep looping back and adjusting earlier assumptions

  Instruct AI Agents like Claude Code to target on specific spec files and implement only Phase 1. If you get stuck on how to do that, ask an LLM on how to prompt your coding agent to focus, you will learn how they operate.

  Always ask the coding agent to review any markdown files used to document your solution and update with current features, progress, and next issues.

  Paste all .md files back into other AI's e.g. high models of ChatGPT and ask it to review and identify missing areas / issues

  Don't believe everything the agents say, challenge them and refuse to let them make you happy or confirm your actions, that is not their job.

  Always provide context around errors that you want to solve, read the error, read the line number, paste in the whole function or focus your Cursor.ai prompt to that file.

  Work with all the AI's, each has their strength.

  Don't use the free models, pay, it's like running a business with borrowed tools, don't.

  Learn like crazy, there are so many tips I'm nowhere near learning.

  Be kind to your agent
(Edited: formatting)
  • esafak 2 days ago

    Your formatting made it harder to read. I'd remove the leading spaces, except for list items.

rikroots 2 days ago

I've spent the past week overcoming my fear of Google's Gemini and OpenAI's ChatGPT. Things I've learned:

- Using an AI for strange tasks like using a TTS model to turn snippets of IPA text (for a constructed language) into an audio file (via CLI) - much of the task turned out to be setting up stuff. Gemini was not very good when it came to giving me instructions for doing things in the GCP and Google Workspace browser consoles. ChatGPT was much clearer with instructions for setting up AWS CLI locally and navigating the AWS browser console to create dedicated user for the task etc. The final audio results were mixed, but then that's what you get when trying to beat a commercial TTS AI to doing something it really thinks you're mad to try.

- Working with ChatGPT to interrogate a Javascript library to produce a markdown file summarising the library's functionality and usage, to save me the time of repeating the exercise with LLMs during future sessions. Sadly the exercise didn't help solve the truly useless code LLMs generate when using the library ... but it's a start.

- LLMs are surprisingly good at massaging my ego - once I learned how to first instruct them to take on a given persona before performing a task: <As an English literature academic, analyse the following poem: title: Tournesols; epigraph: (After "Six Sunflowers, 1888" by van Gogh / felled by bombs in 1945); text: This presented image, dead as the hand that / drew it, an echo blown to my time yet // flames erupt from each auburn wheel - / they lick at the air, the desk: sinews // of heat shovelled on cloth. Leaves / jag and drop to touch green glaze - // I want to tooth those dry seeds, sat / by the window caught on the pot's neck // and swallow sunshine. So strong / that lost paint of the hidden man.>

I still fear LLMs, but now I fear them a little less ...

protocolture 2 days ago

I find the code quality is an LLMs first priority. It might not execute but it will have good variable names and comments.

I find that so far their quality is horizontal, not vertical.

A project that involves small depth across 5 languages/silos? Extremely useful.

A long project in a single language? Nearly useless.

I feel like its token memory. And I also feel like the solution will be deeper code modularisation.

birn559 2 days ago

As coding even today becomes more popular every year, a lot of people are junior level or work in an environment that only need junior level skills. I assume those benefit a lot from LLMs.

Also a lot of marketing and it's cool to hype LLMs and I guess people like to see content about what it can do in YouTube and Instagram.

aristofun 2 days ago

As google on steroids - yes. Next level helpful and offloads a lot of dumb and repetitive work from you.

As a developer buddy - no. LLMs don't actually think and don't actually learn like people do. That part of the overinflated expectations is gonna hit hard some companies one day.

xiphias2 2 days ago

For me anything before Codex model with webui was time waster.

Now with webui what's important is to constantly add tests around the code base, also if it gets stuck, go through the logs and understand why.

It's more of a management role of ,,unblocking'' the LLM if it gets stuck and working with it than fitting it to my previous workflow.

  • kordlessagain 2 days ago

    I agree with this. What tools are you using with Claude?

Narciss 2 days ago

I have been getting very good results.

What matters: - the model -> choose the SOTA (currently Claude 4 Opus). I use it mostly in Cursor. - the prompt: give it enough context to go by, reference files (especially the ones where it can start delving deeper from), be very clear in your intentions. Do bullet points.

- for a complex problem: ask it to break down its plan for you first. Then have a look to make sure it’s ok. If you need to change anything in your plan, now’s the time. Only then ask it to build the code.

- be patient: SOTA models currently aren’t very fast

I work at a company with millions of MAU, as well as do side projects - for the company, I do spend a bit more time checking and cleaning code, but lately with the new models less and less.

For my side projects, I just bang through with the flow above.

Good luck!

platevoltage 2 days ago

Th best value I get out of AI is just having "someone" to bounce ideas off of, or to get quick answers from. I find that it's a HUGE help if I need to write code/understand code in a language that I don't work with every day. I might get it to write me a function that takes X as input and outputs Y. I might get it to set up boilerplate. I have talked to people that have said "I don't even write code anymore". That's definitely not me. Maybe it's a skill issue, but yeah, I'm kind of in the same boat too.

AdrianB1 2 days ago

I tried using LLMs for 2 types of work:

1. Improve the code that I already have. Waste of time, it never works. This is not because my code is too good, but because it is SQL with complex context and I get more hallucinations than usable code, still the usable code is good for basic tasks and not for more.

2. Areas I rarely use and I don't maintain an expertise on. This is where it is good value, I get 80% of what I need in 20% of the time, I take it and complete the work. But this does not happen too often, so the overall value is not there yet.

In a way is like RPA: it does something, not great, but it saves some time.

Vaslo 2 days ago

For the mediocre coder like myself, it’s a game changer. Instead of finicking around with reading a weird dictionary schema in Python, I simply give a sample and boom I have code. Not to mention it shows me new ways of doing things that I was only familiar with and now does it by example, and then I have a better understand of using it. I’m talking about things like decorators and itertools.

I think it’s great for coders of all levels, but jr programmers will get lost once the llm inevitably hallucinates and the expert will get gains, but not like those who are in the middle.

Yiling-J 2 days ago

I think Jules does a good job at "generating code I'm willing to maintain." I never use Jules to write code from scratch. Instead, I usually write about 90% of the code myself, then use the agent to refactor, add tests (based on some I've already written), or make small improvements.

Most of the time, the output isn't perfect, but it's good enough to keep moving forward. And since I’ve already written most of the code, Jules tends to follow my style. The final result isn’t just 100%, it’s more like 120%. Because of those little refactors and improvements I’d probably be too lazy to do if I were writing everything myself.

wejick 2 days ago

Working with AI centric IDE on the mature codebase needs different skill set that I suspect related to people management. As the pattern would be describing a problem well, making a good plan, delegate and giving feedback.

On the other side, getting a good flow is not trivial. I had to tweak rules, how to describe problem, how to plan the work and how to ask the agent. It takes time to be productive.

Eg. Asking agent to create a script to do string manipulation is better than asking them to do inplace edit. As it's easier to debug and repeat.

HybridCurve 2 days ago

I've only had great luck with the LLMs(chatgpt 3o) generated Perl code. It was able to synthesize code for a GTK2/3 application fairly consistently, without generating any syntax errors. Most of the code worked as described, and it seemed to make more mistakes misunderstand my descriptions of features rather than when implementing them. My colleagues suggested it was because Perl's popularity had fallen significantly before 2016, and the training data set might've had much less noise.

nottorp 2 days ago

So you want to use a LLM on a code base. You have to feed it your code base as part of the prompt, which is limited in size.

I don't suppose there's any solution where you can somehow further train a LLM on your code base to make it become part of the neural net and not part of the prompt?

This could be useful on a large ish code base for helping with onboarding at the least.

Of course you'd have to do both the running and training locally, so there's no incentive for the LLM peddlers to offer that...

  • esafak 2 days ago

    Modern tools don't fine tune on your code base but use RAG; select the context to feed it to the LLM with each request. The better the context inference algorithm, the better the results. See if your tool tells you what files it selected.

    • nottorp 2 days ago

      Yes but why?

      On a large code base it would probably work better if you didn't need to put it all in the context... if you even could...

      • esafak 2 days ago

        I am saying the same thing; you want to selectively establish the context, not pass it all.

        • nottorp a day ago

          I want to train the LLM on the whole code base and then pass a hand picked context specific to what I'm asking.

          So it doesn't only suggest what can be found on w3schools and geeks4geeks and maybe stackoverflow, but also whatever idioms and utility functions my code base has.

101011 2 days ago

If I could offer another suggestion from what's been discussed so far - try Claude Code - they are doing something different than the other offerings around how they manage context with the LLM and the results are quite different than everything else.

Also, the big difference with this tool is that you spend more time planning, don't expect it to 1 shot, you need to think about how you go from epic to task first, THEN you let it execute.

cies 2 days ago

I found they work for tasks that have been done 1000s of times already. But for creative solutions in super specialized environments (lot's of the works I do it just that) they cannot help me.

I expect they soon will be able to help me with basic refactoring that needs to be performed across a code base. Luckily my code uses strong types: type safety quickly shows where the LLM was tripping/forgetting.

  • esafak 2 days ago

    I find it helps to give it examples of what I want it to do in such cases.

irrational 2 days ago

At my current job, I need to write quite a bit of python. I been programming for enough decades that I can look at Python code and tell what it is doing, but creating it from scratch? No. But copilot “knows” Python and can write the code that I can then read and tweak. Maybe someone who actually learned Python would write the code differently, but so far it works very well.

  • dowager_dan99 2 days ago

    I did this last week and the Python code it created technically "worked" but felt on par with your typical Excel VBA macro. Not something you'd want to maintain or show anyone else

zkry 2 days ago

Use cases like the ones you mentioned having are truly amazing. It's a shame that the AI hype machine has left us thinking of these use cases as practically nothing, leaving us disappointed.

My belief is that true utility will make itself apparent and won't have to be forced. The usages of LLMs that provide immense utility have already spread across most the industry.

coffeefirst 2 days ago

I’ve landed where you are.

And given the output I’ve seen when I’ve tried to make it do more, I seriously doubt any of this magic generated software actually works.

65 2 days ago

I'd like to think the point of technology is to do the same thing but faster.

AI tools can do things faster, but at lower quality. They can't do the _same_ thing faster.

So AI is fine for specifically low quality, simple things. But for anything that requires any level of customization or novelty (which is most software), it's useless for me.

harrall 2 days ago

I find Junie (from JetBrains / IntelliJ) great compared to other offerings. I have 10+ years experience as well.

It writes JUnit tests with mocks, chooses to run them, and fixes the test or sometimes my (actually broken) code.

It’s not helpful for 90% of my work, but like having a hammer, it’s good to have when you know that you have a nail.

taosx 2 days ago

I'm getting a lot of value in areas where I don't have much experience but most of the time I still write the final version.

I'm not building commercial software and don't have a commercial job at the moment so I'm kind of struggling with credits otherwise I would probably blow 40-100$ dollars a day.

never_inline 2 days ago

Give it context and copy paste the code yourself.

They're good at coming up with new code.

Give it function signature with types and it will give pretty good implementation.

Tell it to edit something, and it will lose track.

The write-lint-fix workflow with LLMs doesn't work for me - LLM is monkey brain edits unrelated parts of code.

DonHopkins 2 days ago

I am struggling to programmatically get syntactically valid JSON out of LLMs, using both the OpenAI and Vertex apis.

I am using:

  "response_format": { "type": "json_object" }
And with Vertex:

    "generationConfig": {
      "responseMimeType": "application/json"
    }
And even:

  "response_format": {
    "type": "json_schema",
    "json_schema": { ...
And with Vertex:

    "generationConfig": {
      "responseMimeType": "application/json",
      "responseSchema": { ...

Neither of them is reliable.

It always gives me json in the format of a markup document with a single json code block:

    ```json
    {}
    ```
Sure I can strip the code fence, but it's mighty suspicious I asked for json and got markup.

I am getting a huge number of json syntax errors, so it's not even getting to the schemas.

When i did get to the schemas, it was occasionally leaving out fields that I'd declared were required (even if i.e. null or an empty array). So I had to mark them as not required, since the strict schema wasn't guiding it to produce correct output, just catching it when it did.

I admit I'm challenging it by asking it to produce json that contains big strings of markup, which might even contain code blocks with nested json.

If that's a problem, I'll refactor how I send it prompts so it doesn't nest different types.

But that's not easy or efficient, because I need it to return both json and markup in one call, so if I want to use "responseMimeType": "application/json" and "responseSchema", then it can ONLY be json, and the markup NEEDS to be embedded in the json, not the other way around, and there's no way to return both while still getting json and schema validation. I'd hate to have to use tool calls as "out parameters".

But I'm still getting a lot of json parsing problems and schema validation problems that aren't related to nested json formatting.

Are other people regularly seeing markup json code blocks around what's supposed to be pure json, and getting a lot of json parsing and schema validation issues?

  • slashtmpslashme 2 days ago

    Use structured output / response format with JSON schema feature if your models provide it (openai for sure does, I think it's json_schema instead of json_object and provide a schema).

    Otherwise there are libraries like instructor (python) which can help to generate json in markdown code blocks.

koonsolo 2 days ago

I think you and your friends actually agree on the application of LLM's, and I agree with that too.

So the question really comes down to what kind of project you are developing:

Get an MVP fast? LLM is great!

Proof of Concept? LLM rules!

Big/Complex project? LLM junior developer is not up to the task.

bird0861 2 days ago

I would put myself in the 100x camp but my start in AI (and programming) was symbolic approaches in lisp/scheme, then I put that away for about a decade and got into NLP and then got into question answering and search/information retrieval. I have read probably at least 20 papers on LLMs a week every week since December 2022. So I have some familiarity with the domain but I also would not exactly call myself an expert on LLMs. However, I would say I am confident I mostly understand what is required for good results.

Getting the best possible results requires: - an LLM trained to have the "features" (in the ML/DL sense of the word) required to follow instructions to complete your task - an application that manages the context window of the LLM - strategically stuffing the context window with preferences/conventions, design information, documentation, examples, your git repo's repo map, and make sure you actually use rules and conventions files for projects. Do not assume the LLM will be able to retrieve or conjure up all of that for you. Treat it like a junior dev and lead it down the path you want it to take. It is true, there is a bit of micromanagement required but Aider makes that very very simple to do. Aider even makes it possible to scrape a docs page to markdown for use by the LLM. Hooking up an LLM to search is a great way to stuff the context window BTW, makes things much simpler. You can use the Perplexity API with Aider and quickly write project plans and fetch necessary docs quickly this way; just turn that into markdown files you'll load up later after you switch models to a proper code gen model. Assume that you may end up editing some code yourself, Aider makes launching your editor easy though.

This mostly just works. For fun the first thing I did with Aider was to write a TUI chat interface for ollama and I had something I could post to github in about an hour or two.

I really think Aider is the missing ingredient for most people. I have used it to generate documentation for projects I wrote by hand, I have used it to generate code (in one of my choice languages) for projects written in a language I didn't like. It's my new favorite video game.

Join the Aider discord, read the docs, and start using it with Gemini and Sonnet. If you want local, there's more to that than what I'm willing to type in a comment here but long story short you also need to make a series of correct decisions to get good results from local but I do it on my RTX4090 just fine.

I am not a contributor or author of Aider, I'm just a fanatical user and devotee to its way of doing things.

  • jz2dlimit 14 hours ago

    First responder who put themselves into the 100x category. Do you mind sharing what setup you use on your local RTX4090? Like what models you run and what kinds of prompt you feed it? I'm contemplating building an LLM rig to do some experimentation.

    It also sounds like you've read a lot of papers (20 per week since 2022 December would equate to 2000 by today). Do you have any favorites that really stood out to you?

shio_desu 2 days ago

Honestly I've been getting a lot of use out of LLMs for coding, and have been adjusting my approach to LLM usage over the past year and a half. The current approach I take that has been fairly effective is to spend a lot of focus energy writing out exactly what I'm looking to implement, sometimes taking 30 or more minutes creating a specs doc / implementation plan, and passing it to a agent with an architect persona to review, generate a comprehensive, phased implementation document. I then review the document, iterate with it to make sure the plan works well, then send it off to do the work.

I'm not yet a fan of Windsurf or Cursor, but honestly Roo Codes out of the box personas for architect, and orchestration to spin up focused subtasks works well for me.

I am kinda treating it how I would a junior, to guide it there, give it enough information to do the work, and check it afterwards, ensuring it didn't do things like BS test coverage or write useless tests / code.

It works pretty well for me, and I've been treating prompting these bots just as a skill I improve as I go along.

Frankly it saves me a lot of time, I knocked out some work Friday afternoon that I'd estimate was probably 5pts of effort in 3 hours. I'll take the efficiency anyday as I've had less actual coding focus time in coding implementations than I used to in my career due to other responsibilities.

klntsky 2 days ago

> I had to micromanage them infinitely

You are missing a crucial part of the process - writing rules

mhb 2 days ago

Agreed. I have an additional question. Given what look like limitations on tasks of larger scope, what problems are people who claim that they have run the LLMs for days working on?

th0ma5 2 days ago

All the comments here in the positive for LLMs including all the posts by experts can be summed up as "these are the loto numbers that worked for me."

GaryNumanVevo 2 days ago

If you can't understand the code the LLM is writing, how can you expect to debug and troubleshoot it when your vibe coded feature gets deployed to production?

bjourne 2 days ago

> All these use cases work great, I save a lot of time.

So why are you complaining? I use AI all the time to give me suggestions and ideas. But I write the perfect code myself.

ragdollk 2 days ago

Honestly I think what this boils down to is if you are concerned with good code or concerned with working code. They make working code. And if there is a bug or implementation that fails (which there usually is with the first shot) when asked, they work to resolve that. LLMs are very results driven, and they will do most anything to try and get the results requested. Even cheat. So, for coders that are concerned with quality code... of course you'll be unimpressed. For project managers who are concerned with a working piece of software, of course you will be impressed by that. You didn't have to ask a developer to create it. As far as impact on our culture goes... much like other disruptions in the past, they will win this war. They will win because they lower the barrier to entry to get working software. You ask what the value is - it's that.

r00sty 2 days ago

I always spend more time fighting the bot and having to debug the code it gives for anything it gives to be useful.

tonyhart7 2 days ago

Yeah it is code monkey typewriter situation

but I think this is solvable when context length goes way higher than current length

souenzzo 2 days ago

my code setup hasn't changed in 10 years

I tried to use many LLM tools. They are generally not capable of doing anything useful in a real project.

Maybe solutions like MCP, that allow the LLM to access the git history make the LLM become useful for someone that actually works on a project.

  • lyu07282 2 days ago

    I find that hard to believe, I also don't think LLMs provide the value some others are seeing but there is also code search and refactoring tasks that LLMs can help with. Instead of heaving to write a codemod, I can just write a prompt and accomplish the same thing in much less time. To say they have literally zero value just seems uninformed tbh. That's the sort of black and white thinking that isn't very helpful in the conversation.

  • consumer451 2 days ago

    Postgres or Supabase MCP is really useful for me, especially when dealing with data related issues like permissions bugs. It seems faster to me. Example:

    > I cannot see the Create Project button as user project-admin@domain.com. Please use Supabase MCP to see if I have the correct permissions, if so, are we handling it correctly in the UI?

abdullin 2 days ago

I heard an interesting story from an architect at a large software consultancy. They are using AI in their teams to manage legacy codebases in multiple languages.

TLDR; it works for a codebase of 1M LoC. AI writes code a lot faster, completing tasks in days instead of sprints. Tasks can be parallelized. People code less, but they need to think more often.

(1) Maintain clear and structured architecture documentation (README, DDD context/module descriptions files, AGENTS-MD).

(2) Create detailed implementation plans first - explicitly mapping dependencies, tests, and potential challenges.

(3) Treat the implementation plan as a single source of truth until execution finishes. Review it manually and with LLM-assistance to detect logical inconsistencies. Plan is easier to change, than a scattered diff.

(4) In complex cases - instruct AI agents about relevant documents and contexts before starting tasks.

(5) Approve implementation plans before allowing AI to write code

(6) Results are better if code agent can launch automated full-stack tests and review their outputs in the process.

The same works for me in smaller projects. Less ceremony is needed there.

pif 2 days ago

Who could have foreseen that laziness would be that demanding?

kordlessagain 2 days ago

I'm using Claude Desktop to develop an agentic crawler system. My workflow is VSCode next to Claude Desktop. The tooling for Claude uses a bunch of MCP servers I wrote for Evolve: https://github.com/kordless/gnosis-evolve

Core Development Capabilities:

- File Discovery & Navigation: file_explorer with pattern matching and recursive search

- Intelligent Code Search: search_in_file_fuzzy with similarity thresholds for finding relevant code sections

- Advanced Code Editing: file_diff_writer with fuzzy matching that can handle code changes even after refactoring

- Backups: backup and restores of any files at any state of change.

- System Monitoring: Real-time log analysis and container management

- Hot Deployment: docker_rebuild for instant container updates (Claude can do the rebuild)

The Agentic Workflow:

- Claude searches your codebase to understand current implementation

- Uses fuzzy search to find related code patterns and dependencies

- Makes intelligent edits using fuzzy replacement (handles formatting changes)

- Monitors logs to verify changes work correctly

- Restarts containers as needed for testing

- Iterates based on log feedback

- Error handling requires analyzing logs and adjusting parsing strategies

- Performance tuning benefits from quick deploy-test-analyze cycles

I've not had any issues with Claude being able to handle changes, even doing things like refactoring overly large HTML files with inline CSS and JS. Had it move all that to a more manageable layout and helped out by deleting large blocks when necessary.

The fuzzy matching engine is the heart of the system. It uses several different strategies working in harmony. First, it tries exact matching, which is straightforward. If that fails, it normalizes whitespace by collapsing multiple spaces, removing trailing whitespace, and standardizing line breaks, then attempts to match again. This handles cases where code has been reformatted but remains functionally identical.

When dealing with multi-line code blocks, the system gets particularly clever. It breaks both the search text and the target content into individual lines, then calculates similarity scores for each line pair. If the average similarity across all lines exceeds the threshold, it considers it a match. This allows it to find code blocks even when individual lines have been slightly modified, variable names changed, or indentation adjusted.

ninetyninenine 2 days ago

Relax once it can do what you want you’re out of a job.

dboreham 2 days ago

In my team we call it "The Intern".

lc_eckert09 2 days ago

They’re fantastic for making some basic websites or dealing with boilerplate.

I write compilers. Good luck getting an LLM to be helpful in that domain. It can be helpful to break down the docs for something like LLVM but not for writing passes or codegen etc

paulcole 2 days ago

> but trying to get it to generate code that I am willing to maintain and "put my name on" took longer than writing the code would have

Why don’t you consider that the AI will be the one maintaining it?

  • bagacrap 2 days ago

    If that were possible we'd have already pointed LLMs at the Linux bug tracker (for example) and fixed all the open issues.

    • paulcole 2 days ago

      Was all that code written by LLMs?

Sai_Praneeth 2 days ago

i haven't used cursor or codex or any system that says "agentic coding experience"

i speak in thoughts in my head and it is better to just translate those thoughts to code directly.

putting them into a language for LLMs to make sense and understanding the output is oof... too much overhead. and yeah the micromanagement, correcting mistakes, miscommunications, its shit

i just code like the old days and if i need any assistance, i use chatgpt

42lux 2 days ago

Use them like a butler or bad maid. They can do the absolute grunt work now and with time they might get better to do more serious tasks.

journal 2 days ago

As much as someone else is struggling at something else? It's not for everyone just like programming is not for everyone. I don't even type the code anymore, I just copy and paste it, is that still programming? I don't remember last time I had to type out a complete line up to ;.