It becomes obsolete in literally weeks, and it also doesn't work 80% of the time. Like why write a mcp server for custom tasks when I don't know if the llm is going to reliably call it.
My rule for AI has been steadfast for months (years?) now. I write (myself, not AI because then I spend more time guiding the AI instead of thinking about the problem) documentation for myself (templates, checklist, etc.). I give ai a chance to one-shot it in seconds, if it can't, I am either review my documentation or I just do it manually.
A perspective which has helped me is viewing LLM-based offerings strictly as statistical document generators, whose usefulness is entirely dependent upon their training data set plus model evolution, and whose usage is best modeled as a form of constraint programming[0] lacking a formal (repeatable) grammar. As such, and when considering the subjectivity of natural languages in general, the best I hope for when using them are quick iterations consisting of refining constraint sentence fragments.
Here is a simple example which took 4 iterations using Gemini to get a result requiring no manual changes:
# Role
You are an expert Unix shell programmer who comments their code and organizes their code using shell programming best practices.
Create a bash shell script which reads from standard input text in Markdown format and prints all embedded hyperlink URL's.
The script requirements are:
- MUST exclude all inline code elements
- MUST exclude all fenced code blocks
- MUST print all hyperlink URL's
- MUST NOT print hyperlink label
- MUST NOT use Perl compatible regular expressions
- MUST NOT use double quotes within comments
- MUST NOT use single quotes within comments
EDIT:
For reference, a hand-written script satisfying the above (excluding comments for brevity) could look like:
The ability of newer agents to develop plans that been be reviewed and most importantly do a build test modify cycle has really helped. You can task an agent with some junior programmer task and then go off and do something else.
An alternative is to view the AI agent as a new developer on your team. If existing guidance + one-shot doesn't work, revisit the documentation and guidance (ie dotMD file), see what's missing, improve it, and try again. Like telling a new engineer "actually, here is how we do this thing". The engineer learns and next time gets it right.
I don't do MCPs much because of effort and security risks. But I find the loop above really effective. The alternative (one-shot or ignore) would be like hiring someone, then if they get it wrong, telling them "I'll do it myself" (or firing them)... But to each his own (and yes, AI are not human).
I don't think you can say it learns - and that is part of the issue. Time mentoring a new colleague is well spent making the colleague grow professionally.
Time hand-holding an AI agent is wasted when all you guidance inevitably falls out of the context window and it start making the same mistakes again.
On the contrary; stubborn refusal to anthropomorphize LLMs is where the frustration comes from. To a first approximation, the models are like little people on a chip; the success and failure modes are the same as with talking to people.
If you look, all the good advice and guidelines for LLMs are effectively the same as for human employees - clarity of communication, sufficient context, not distracting with bullshit, information hygiene, managing trust. There are deep reasons for that, and as a rule of thumb, treating LLMs like naive savants gives reliable intuitions for what works, and what doesn't.
I treat LLMs as statistics driven compression of knowledge and problem solving patterns.
If you treat it as such it is all understandable where they might fail and where you might have to guide them.
Also treat it as something that during training has been biased to produce immediate impressive results. This is why it bundles everything into single files, try catch patterns where catch will return mock data to show impressive one shot demo.
So the above you have to actively fight against, to make them prioritise scalability of the codebase and solutions.
I agree. Software development is on an ascent to a new plateau. We have not reached that yet. Any skill that is built up now is at best built on a slope.
1. starting fresh, because of context poisoning / long-term attention issues
2. lots of tools makes the job easier, if you give them a tool discovery tool, (based on Anthropics recent post)
We don't have reliable ways to evaluate all the prompts and related tweaking. I'm working towards this with my agentic setup. Added time travel for sessions based on Dagger yesterday, with forking, cloning, registry probably toda
Gemini CLI at this stage isn't good at complex coding tasks (vs. Claude Code, Codex, Cursor CLI, Qoder CLI, etc.). Mostly because of the simple ReAct loop, compounded by relatively weak tool calling capability of the Gemini 2.5 Pro model.
> I haven't tried complex coding tasks using Gemini 3.0 Pro Preview yet. I reckon it won't be materially different.
Gemini CLI is open source and being actively developed, which is cool (/extensions, /model switching, etc.). I think it has the potential to become a lot better and even close to top players.
The correct way of using Gemini CLI is: ABUSE IT!
With 1M Context Window (soon to be 2M) and generous daily (free) quota are huge advantages. It's a pity that people don't use it enough (ABUSE it!). I use it as a TUI / CLI tool to orchestrate tasks and workflows.
> Fun fact: I found Gemini CLI pretty good at judging/critiquing code generated by other tools LoL
Recently I even hook it up with homebrew via MCP (other Linux package managers as well?), and a local LLM powered Knowledge/Context Manager (Nowledge Mem), you can get really creative abusing Gemini CLI, unleash the Gemini power.
I've also seen people use Gemini CLI in SubAgents for MCP Processing (it did work and avoided polluting the main context), can't help laughing when I first read this -> https://x.com/goon_nguyen/status/1987720058504982561
Gemini CLI is a wild beast. The stories of it just going bonkers and refactoring everything it reads on its own are not rare. My own experience was something like, "Edit no code. Only give me suggestions. blah blah blah" first thing it does is edit a file without any other output. It's completely unreliable.
Pro 3 is -very- smart but it's tool use/following directions isn't great.
I've been using Gemini 3 in the CLI for the past few days. Multiple times I've asked it to fix one specific lint error, and it goes off and fixes all of them. A lot of times it fixes them by just disabling lint rules. It makes reviewing much harder. It really has a mind of its own and sometimes starts grinding for 20 minutes doing all kinds of things - most of them pretty good, but again, challenging to review. I wish it would stick to the task.
> I haven't tried complex coding tasks using Gemini 3.0 Pro Preview yet. I reckon it won't be materially different.
In my limited testing, I found that Gemini 3 Pro struggles with even simple coding tasks. Sure, I haven't tested complex scenarios yet and have only done so via Antigravity. But it is very difficult to do that with the limited quota it provides. Impressions here - https://dev.amitgawande.com/2025/antigravity-problem
Personally, I consider Antigravity was a positive & ambitious launch. Initial impression was that there are many rough edges to be smoothed out. I hit many errors like 1. communicating with Gemini (Model-as-a-Service) 2. Agent execution terminated due to errors, etc., but somehow it completed the task (verification/review UX is bad).
Pricing for paid plans with AI Pro or Workspace would be key for its adoption, when Gemini 3.x and Antigravity IDE are ready for serious work.
Interesting. I did not face many issues while communicating with Gemini. But I believe these issues will iron themselves out -- Google does feel to have rushed the launch.
I like AI a lot. I try to use to as much as I can. It feels like it is becoming an essential part of making me a more effective human, like the internet or my iphone. I do not see it as a bad thing.
But I can't help but to get "AI tutorial fatigue" from so many posts telling me about how to use AI. Most are garbage, this one is better than most. Its like how javascript developer endlessly post about the newest ui framework or js build tool. This feels a lot like that.
Notable re author: “Addy Osmani is an Irish Software Engineer and leader currently working on the Google Chrome web browser and Gemini with Google DeepMind. A developer for 25+ years, he has worked at Google for over thirteen years, focused on making the web low-friction for users and web developers. He is passionate about AI-assisted engineering and developer tools. He previously worked on Fortune 500 sites. Addy is the author of a number of books including Learning JavaScript Design Patterns, Leading Effective Engineering Teams, Stoic Mind and Image Optimization.“
I really wish there were a de facto state-of-the-art coding agent that is LLM-agnostic, so that LLM providers wouldn't bother reinventing their own wheels like Codex and Gemini-CLI. They should be pluggable providers, not independent programs. In this way, the CLI would focus on refining the agentic logic and would grow faster than ever before.
Currently Claude Code is the best, but I don't think Anthropic would pivot it into what I described. Maybe we still need to wait for the next groundbreaking open-source coding agent to come out.
There is Aider (aider.chat), and it has been there for couple years now. Great tool.
Alas, you don't install Claude Code or Gemini CLI for the actual CLI tool. You install it because the only way agentic coding makes sense is through subscription billing at the vendor - SOTA models burns through tokens too fast for pay-per-use API billing to make sense here; we're talking literally a day of basic use costing more than a monthly subscription to the Max plan at $200 or so.
Aider is in a sad state. The maintainer does not "maintain" for quite some time now (look at the open PRs and issues). It's not state of the art definitely but one of the first and best ones in the past. A fork was created, Aider CE, from some members of the Discord community https://github.com/dwash96/aider-ce The fork looks and works promising but there is (sadly) so much more development in the other AI CLI tools nowadays.
Opencode (from SST; not the thing that got rebranded as Crush) seems to be just that. I've had a very good experience with it for the last couple of days; having previously used gemini-cli quite a bit. Opencode also has/hosts a couple of "free" models options right now, which are quite decent IMO.
There are many many similar alternatives, so here's a random sampling: Crush, Aider, Amp Code, Emacs with gptel/acp-shell, Editor Code Assistant (which aims for an editor-agnostic backend that plugs into different editors)
Finally... there is quite a lot of scope for co-designing the affordances / primitives supported by the coding agent and the LLM backing it (especially in LLM post-training). So factorizing these two into completely independent pieces currently seems unlikely to give the most powerful capabilities.
ymmv, but I think all of this is too much and you generally don't need to think about how to use an AI properly since screaming at it usually works just as well as very fine tuned instructions.
you don't need claude code, gemini-cli or codex I've been doing it raw as a (recent) lazyvim user with a proprietary agent with 3 tools: git, ask and ripgrep and currently gemini 3 is by far the best for me even without all these tricks.
gemini 3 has a very high token density and a significantly larger context than any model that is actually usable, every 'agent' I start shoves 5 things into the context:
- most basic instructions such as: generate git format diff only when editing files and use the git tool to merge it (simplified, it's more structured and deeper than this)
- tree command that respects git ignore
- $(ask "summarize $(git diff)")
- $(ask "compact the readme $(cat README.MD"))
- (ripgrep tools, mcp details, etc)
when the context is too bloated I just tell it to write important new details to README.MD and then start a new agent
I'm doing something very similar but even simpler and Gemini 3 is absolutely crushing it. I tried to do this with other models in the past, but it never really felt productive.
I don't even generate diffs, just full files (though I try and keep them small) and my success rate is probably close to 80% one-shotting very complex coding tasks that would take me days.
I've been using Gemini CLI for months now, mainly because we have a free subscription for it through work.
Tip 1, it consistently ignores my GEMINI.md file, both global and local. Even though it's always saying that "1 GEMINI.md file is being used", probably because the file exists in the right path.
Tip 12, had no idea you could do this, seems like a great tip to me.
Tip 16 was great, thanks. I've been restarting it everytime my environment changes for some reason. Or having it run direnv for me.
All the same warnings about AI apply for Gemini CLI, it hallucinates wildly.
But I have to say gemini cli gave me my first real fun experience using AI. I was a late comer to AI, but what really hooked me was when I gave it permission to freely troubleshoot a k8s PoC cluster I was setting up. Watching it autonomously fetch logs, objects, troubleshoot until it found the error was the closest thing to getting a new toy for christmas for me in many years.
So I've kept using it, but it is frustrating sometimes when AI is behaving so stupid you just /quit and do it yourself.
Thanks for sharing. Gemini CLI doing live troubleshooting for a K8s cluster is surreal. I am keen to try that out, since I have just created RKE2 clusters.
IMHO, one understated downside in today's AI/Agentic/Vibe-coding options is that ALL of them are evolving a bit too fast before any of these types of "best practices" can become a habit with a critical mass of developers, rendering many such tips obsolete very quickly (as another person rightfully pointed out).
Sure, software in general will keep evolving rapidly but the methods and tools to build software need to be relatively more stable. E.g. many languages and frameworks come and go, but how we break down a problem, how we discover and understand codebases, etc. have more or less remained steady (I think).
I see this a paradox and have no idea what the state of equalibrium will look like.
I'm pretty sure we are in an apple vs android situation, where you give lifetime apple users an android phone, and after a day they report that android is horrid. In reality, they just aren't used to how stuff is done on android.
I think many devs are just in tune with the "nature" of Claude, and run aground easier when trying to use gemini or Chatgpt. This also explains why we get these perplexing mixed signals from different devs.
There are some clear objective signals that aren’t just user preference. I shelled out the $250 for Gemini’s top tier and am profoundly disappointed. I had forgotten that loops were still a thing. I’ve hit this multiple times in Gemini CLI, and in different projects. It gets stuck in a loop (as in the exact same, usually nonsense, message over and over) and the automated loop detection stops the whole operation. It also stops in the middle of an operation very frequently. I don’t hit either of these in Claude Code or Codex.
There certainly is some user preference, but the deal breakers are flat out shortcomings that other tools solved (in AI terms) long ago. I haven’t dealt with agent loops since March with any other tool.
Agreed. Been using Claude Code daily for the past year and Codex as a fall back when Claude gets stuck. Codex has two problems: it Windows support sucks and it's way to "mission driven" vs the collaborative Claude. Gemini CLI falls somewhere in the middle, has some seriously cool features (Ctrl+X to edit prompt in notepad) and it's web research capability is actually good.
I'm constantly floored with how well claude-cli works and gemini-cli stumbled on something simple the first time I used it and Gemini's 3 Pro release availability was just bad.
Ah, Gemini is the model and Google Vertex AI is like AWS Bedrock, it's the Google service actually serving Gemini. I wonder if Gemini can be used from OpenCode when made available through a Google Workspace subscription...
I too am curious. My daily driver has been Claude Code CLI since April. I just started using Codex CLI and there are lot of gaps--the most annoying being permissions don't seem to stick. I am so used to plan mode in Claude Code CLI and really miss that in Codex.
The model needs to be trained to use the harness. Sonnet 4.5 and gpt-5.1-codex-max are "weaker" models in abstract, but you can get much more mileage out of them due to post-training.
It’s inferior but copilot is even more inferior to it. I used it again recently just to see after cursor And Claude code. It’s laughably bad. Almost like they don’t care.
I am worried that we are diverging with CLI updates across models. I wish we had converged towards a common functionality and behaviour. Instead, we need to build knowledge of model-specific nuances. The cost of choosing a model is high.
Would be nice to have an official confirmation. Once token get back to the user those are likely already counted.
Sucks when the LLM goes on a rant only to stop because of hardcoded safeguards, or what I encounter often enough with Copilot: it generates some code, notices it's part of existing public code and cancels the entire response. But that still counts towards my usage.
I had a terrible first impression with Gemini CLI a few months ago when it was released because of the constant 409 errors.
With Gemini 3 release I decided to give it another go, and now the error changed to: "You've reached the daily limit with this model", even though I have an API key with billing set up. It wouldn't let me even try Gemini 3 and even after switching to Gemini 2.5 it would still throw this error after a few messages.
Google might have the best LLMs, but its agentic coding experience leaves a lot to be desired.
Not so much with Gemini 3 Pro (which came out a few days ago)... to the point that the loop detection that they built into gemini-cli (to fight that) almost always over-detects, thinking that Gemini 3 Pro is looping when it in fact isn't. Haven't had it fail at tool calls either.
Interesting, I run into loop detection in 2.5-pro but haven't seen it get in 3 Pro. Maybe its the type of tasks I throw st it though, I only use 3 at work and the code base is much more mature and well defined than my random side projects.
Gemini 3 with CLI is relentless if you give it detailed specs and other than API errors, it just is great. I'd still rank Claude models higher but Gemini 3 is good too.
And the GPT-5 Codex has a very somber tone. Responses are very brief.
The problem is that Gemini CLI simply doesn’t work. Beside simplest of tasks like creating new release it is useless as coding assistant. Doesn’t have a plan mode, jumps right into coding and then gets stuck in the middle of spaghetti code.
Gemini models are actually pretty capable but Gemini CLI tooling makes them dumb and useless. Google is simply months behind Anthropic and OpenAI in this space!
>this lets you use Gemini 2.5 Pro for free with generous usage limits
Considering that access is limited to the countries on the list [0], I wonder what motivated their choices, especially since many Balkan countries were left out.
Looking through this, I think a lot of these also apply to Google Antigravity which I assume just uses the same backend as the CLI and just UI wraps a lot of these commands (e.g. checkpointing).
There needs to be a lot more focus on the observability and showing users what is happening underneath the hood (especially wrt costs and context management for non-power users).
A useful feature Cursor has that Antigravity doesn't is the context wheel that increases as you reach the context window limit (but don't get me started on the blackbox that is Cursor pricing).
Gemini-CLI on Termux does not work anymore. Gemini itself found a way to fix the problem, but I did not totally grok what it was going to do. It insisted my Termux was old and rotten.
agentic coding seems like its not the top priority but more at capturing the search engine users which is understandable.
still i had high hopes for gemini 3.0 but was let down by the benchmarks i can barely use it in cli however in ai studio its been pretty valuable but not without quirks and bugs
lately it seems like all the agentic coders like claude, codex are starting to converge and differentiated only by latency and overall cli UX and usage.
i would like to use gemini cli more even grok if it was possible to use it like codex
I love the model, hate the tool. I’ve taken complex stuff and given it to Gemini 3 and been impressed, but Anthropic has the killer app with Claude Code. The interplay of sonnet (a decent model) and the tools and workflow they’ve got with Claude code around it supercharge the outcome. I tried Gemini cli for about 5 seconds and was so frustrated, it’s so stupid at navigation in the codebase it takes 10x as long to do anything or I have to guide it there. I have to supervise it rather than doing something important while Claude works in the background
There seems to be an awful many "could" and "might" in that part. Given how awfully limited the Gemini integration inside Google Docs is, it's an area that's just made me feel Google is executing really slowly on this.
I've built a document editor that has AI properly integrated - provides feedback in "Track Changes" mode and actually gives good writing advice. If you've been looking for something like this - https://owleditor.com
It looks nice, but for my use it's very specifically not reviews I want in AI integration with an editor, but to be able to prompt it to write or rewrite large sections, or repeated references to specific things, with minimal additional input. I specifically don't want to go through an approve edit by edit - I'll read through a diff and approve all at once or just tell it how to fix its edits.
Claude at least is more than good enough to do this for dry technical writing (I've not tried it for anything more creative), and so I usually end up using Claude Code to do this with markdown files.
I really tried to get gemini to work properly in Agent mode. Tho it way to often wen't crazy, started rewriting files empty, commenting like "here you could implement the requested function" and many more stuff including running into permanent printing loops of stuff like "I've done that. What's next on the debugger? Okay, I've done that. What's next on the with? Okay, I've done that. What's next on the delete? Okay, I've done that. What's next on the in? Okay, I've done that. What's next on the instanceof? Okay, I've done that. What's next on the typeof? Okay, I've done that. What's next on the void? Okay, I've done that. What's next on the true? Okay, I've done that. What's next on the false? Okay, I've done that. What's next on the null? Okay, I've done that. What's next on the undefined? Okay, I've done that..." which went on for like 1hour (yes i waited to see how long it takes for them to cut it).
Its just really good yet.
I recently tried IntelliJs Junie and i have to say it works rather well.
I mean at the end of the day all of them need a human in the loop and the result is just as good as your prompt, tho with Junie i at least most of the time got something of a result, while with gemini 50% would have been a good rate.
Finally: Still dont see agentic coding for production stages - its just not there yet in terms of quality. For research and fun? Why not.
I have never had a luck with using Gemini. I had a pretty good app create with CODEX. Due to the hype I thought let me give Gemini a try. I asked it find all way to improve security and architecture / design. sure enough it gave a me a list of components etc that didn’t match best patterns and practices. So I let it refactor the code.
It Fucked up the entire repot. It hard coded tenant ids and used ids, it completely destroyed my UI. Broke my entire grpahql integration. Set me back 2 weeks of work.
I do admit the browse version of Gemini chat does much better job at providing architecture and design guidance time to time.
I can't emphasize this enough, it doesn't matter how good a model is or what CLI I'm using, use git and chroot (at the least, container is easier though).
Always make the agent write a plan first and save it to something like plan.md, and tell it to update the list of finished tasks in status.md as it finishes each task from plan.md and to let you review the change before proceeding to next task.
I'll reply to myself since apparently people downvote and move on:
They're useful for allowing agents to work in parallel. I imagine some people give them access to git and tools and sandbox the agents, then let a bunch of them work in separate git worktrees pointed at the same branch, then they come back and investigate/compare and contrast what the agents have done, to accelerate their work.
I think there is value in that but it also feels like a lot of very draining work and I imagine long term you're no longer in control of the code base. Which, I mean, great if you're working on a huge code base since you already don't control that...
The average quality of software engineering is abysmal, and the average quality of software engineering management is even worse. I'm not very enthusiastic about where this is headed.
I am not doing any of this.
It becomes obsolete in literally weeks, and it also doesn't work 80% of the time. Like why write a mcp server for custom tasks when I don't know if the llm is going to reliably call it.
My rule for AI has been steadfast for months (years?) now. I write (myself, not AI because then I spend more time guiding the AI instead of thinking about the problem) documentation for myself (templates, checklist, etc.). I give ai a chance to one-shot it in seconds, if it can't, I am either review my documentation or I just do it manually.
A perspective which has helped me is viewing LLM-based offerings strictly as statistical document generators, whose usefulness is entirely dependent upon their training data set plus model evolution, and whose usage is best modeled as a form of constraint programming[0] lacking a formal (repeatable) grammar. As such, and when considering the subjectivity of natural languages in general, the best I hope for when using them are quick iterations consisting of refining constraint sentence fragments.
Here is a simple example which took 4 iterations using Gemini to get a result requiring no manual changes:
EDIT:For reference, a hand-written script satisfying the above (excluding comments for brevity) could look like:
0 - https://en.wikipedia.org/wiki/Constraint_programmingDo you get tangibly different results if you don't capitalize MUST (NOT)?
The ability of newer agents to develop plans that been be reviewed and most importantly do a build test modify cycle has really helped. You can task an agent with some junior programmer task and then go off and do something else.
An alternative is to view the AI agent as a new developer on your team. If existing guidance + one-shot doesn't work, revisit the documentation and guidance (ie dotMD file), see what's missing, improve it, and try again. Like telling a new engineer "actually, here is how we do this thing". The engineer learns and next time gets it right.
I don't do MCPs much because of effort and security risks. But I find the loop above really effective. The alternative (one-shot or ignore) would be like hiring someone, then if they get it wrong, telling them "I'll do it myself" (or firing them)... But to each his own (and yes, AI are not human).
I don't think you can say it learns - and that is part of the issue. Time mentoring a new colleague is well spent making the colleague grow professionally.
Time hand-holding an AI agent is wasted when all you guidance inevitably falls out of the context window and it start making the same mistakes again.
> The engineer learns and next time gets it right.
Antropomorphizing LLMs like that is the path to madness. That's where all the frustration comes from.
On the contrary; stubborn refusal to anthropomorphize LLMs is where the frustration comes from. To a first approximation, the models are like little people on a chip; the success and failure modes are the same as with talking to people.
If you look, all the good advice and guidelines for LLMs are effectively the same as for human employees - clarity of communication, sufficient context, not distracting with bullshit, information hygiene, managing trust. There are deep reasons for that, and as a rule of thumb, treating LLMs like naive savants gives reliable intuitions for what works, and what doesn't.
I treat LLMs as statistics driven compression of knowledge and problem solving patterns.
If you treat it as such it is all understandable where they might fail and where you might have to guide them.
Also treat it as something that during training has been biased to produce immediate impressive results. This is why it bundles everything into single files, try catch patterns where catch will return mock data to show impressive one shot demo.
So the above you have to actively fight against, to make them prioritise scalability of the codebase and solutions.
Exactly this. People treat LLMs like they treat machines and then are surprised that "LLMs are bad".
The right mental model for working with LLMs is much closer to "person" than to "machine".
I agree. Software development is on an ascent to a new plateau. We have not reached that yet. Any skill that is built up now is at best built on a slope.
I think both are helpful
1. starting fresh, because of context poisoning / long-term attention issues
2. lots of tools makes the job easier, if you give them a tool discovery tool, (based on Anthropics recent post)
We don't have reliable ways to evaluate all the prompts and related tweaking. I'm working towards this with my agentic setup. Added time travel for sessions based on Dagger yesterday, with forking, cloning, registry probably toda
Gemini CLI at this stage isn't good at complex coding tasks (vs. Claude Code, Codex, Cursor CLI, Qoder CLI, etc.). Mostly because of the simple ReAct loop, compounded by relatively weak tool calling capability of the Gemini 2.5 Pro model.
> I haven't tried complex coding tasks using Gemini 3.0 Pro Preview yet. I reckon it won't be materially different.
Gemini CLI is open source and being actively developed, which is cool (/extensions, /model switching, etc.). I think it has the potential to become a lot better and even close to top players.
The correct way of using Gemini CLI is: ABUSE IT! With 1M Context Window (soon to be 2M) and generous daily (free) quota are huge advantages. It's a pity that people don't use it enough (ABUSE it!). I use it as a TUI / CLI tool to orchestrate tasks and workflows.
> Fun fact: I found Gemini CLI pretty good at judging/critiquing code generated by other tools LoL
Recently I even hook it up with homebrew via MCP (other Linux package managers as well?), and a local LLM powered Knowledge/Context Manager (Nowledge Mem), you can get really creative abusing Gemini CLI, unleash the Gemini power.
I've also seen people use Gemini CLI in SubAgents for MCP Processing (it did work and avoided polluting the main context), can't help laughing when I first read this -> https://x.com/goon_nguyen/status/1987720058504982561
Gemini CLI is a wild beast. The stories of it just going bonkers and refactoring everything it reads on its own are not rare. My own experience was something like, "Edit no code. Only give me suggestions. blah blah blah" first thing it does is edit a file without any other output. It's completely unreliable.
Pro 3 is -very- smart but it's tool use/following directions isn't great.
I've been using Gemini 3 in the CLI for the past few days. Multiple times I've asked it to fix one specific lint error, and it goes off and fixes all of them. A lot of times it fixes them by just disabling lint rules. It makes reviewing much harder. It really has a mind of its own and sometimes starts grinding for 20 minutes doing all kinds of things - most of them pretty good, but again, challenging to review. I wish it would stick to the task.
> I haven't tried complex coding tasks using Gemini 3.0 Pro Preview yet. I reckon it won't be materially different.
In my limited testing, I found that Gemini 3 Pro struggles with even simple coding tasks. Sure, I haven't tested complex scenarios yet and have only done so via Antigravity. But it is very difficult to do that with the limited quota it provides. Impressions here - https://dev.amitgawande.com/2025/antigravity-problem
Are we using different models? Here is a simulation of Chernobyl reactor 4 using research grade numerical modeling I made with it in a few days: https://rbmk-1000-simulator-162899759362.us-west1.run.app/
Thanks for sharing, insightful.
Personally, I consider Antigravity was a positive & ambitious launch. Initial impression was that there are many rough edges to be smoothed out. I hit many errors like 1. communicating with Gemini (Model-as-a-Service) 2. Agent execution terminated due to errors, etc., but somehow it completed the task (verification/review UX is bad).
Pricing for paid plans with AI Pro or Workspace would be key for its adoption, when Gemini 3.x and Antigravity IDE are ready for serious work.
Interesting. I did not face many issues while communicating with Gemini. But I believe these issues will iron themselves out -- Google does feel to have rushed the launch.
I like AI a lot. I try to use to as much as I can. It feels like it is becoming an essential part of making me a more effective human, like the internet or my iphone. I do not see it as a bad thing.
But I can't help but to get "AI tutorial fatigue" from so many posts telling me about how to use AI. Most are garbage, this one is better than most. Its like how javascript developer endlessly post about the newest ui framework or js build tool. This feels a lot like that.
Notable re author: “Addy Osmani is an Irish Software Engineer and leader currently working on the Google Chrome web browser and Gemini with Google DeepMind. A developer for 25+ years, he has worked at Google for over thirteen years, focused on making the web low-friction for users and web developers. He is passionate about AI-assisted engineering and developer tools. He previously worked on Fortune 500 sites. Addy is the author of a number of books including Learning JavaScript Design Patterns, Leading Effective Engineering Teams, Stoic Mind and Image Optimization.“
Also a winner of the Irish Young Scientist competition, 2 years before Patrick Collison. https://en.wikipedia.org/wiki/Young_Scientist_and_Technology...
He's published 11 books in the past 5 years?
Is he using AI assisted writing, too?
Some parts of the books are based on expended versions of blog posts.
This just after "Google Antigravity exfiltrates data via indirect prompt injection attack"
https://news.ycombinator.com/item?id=46048996
Who the heck trusts this jank to have wanton reign on their system?
I really wish there were a de facto state-of-the-art coding agent that is LLM-agnostic, so that LLM providers wouldn't bother reinventing their own wheels like Codex and Gemini-CLI. They should be pluggable providers, not independent programs. In this way, the CLI would focus on refining the agentic logic and would grow faster than ever before.
Currently Claude Code is the best, but I don't think Anthropic would pivot it into what I described. Maybe we still need to wait for the next groundbreaking open-source coding agent to come out.
There is Aider (aider.chat), and it has been there for couple years now. Great tool.
Alas, you don't install Claude Code or Gemini CLI for the actual CLI tool. You install it because the only way agentic coding makes sense is through subscription billing at the vendor - SOTA models burns through tokens too fast for pay-per-use API billing to make sense here; we're talking literally a day of basic use costing more than a monthly subscription to the Max plan at $200 or so.
Aider is in a sad state. The maintainer does not "maintain" for quite some time now (look at the open PRs and issues). It's not state of the art definitely but one of the first and best ones in the past. A fork was created, Aider CE, from some members of the Discord community https://github.com/dwash96/aider-ce The fork looks and works promising but there is (sadly) so much more development in the other AI CLI tools nowadays.
Opencode (from SST; not the thing that got rebranded as Crush) seems to be just that. I've had a very good experience with it for the last couple of days; having previously used gemini-cli quite a bit. Opencode also has/hosts a couple of "free" models options right now, which are quite decent IMO.
https://github.com/sst/opencode
There are many many similar alternatives, so here's a random sampling: Crush, Aider, Amp Code, Emacs with gptel/acp-shell, Editor Code Assistant (which aims for an editor-agnostic backend that plugs into different editors)
Finally... there is quite a lot of scope for co-designing the affordances / primitives supported by the coding agent and the LLM backing it (especially in LLM post-training). So factorizing these two into completely independent pieces currently seems unlikely to give the most powerful capabilities.
> I really wish there were a de facto state-of-the-art coding agent that is LLM-agnostic
Cursor?
It’s really quite good.
Ironically it has its own LLM now, https://cursor.com/blog/composer, so it’s sort of going the other way.
Model agnostic tools I would say:
Roo Code or maybe Kilo (which is a fork of Roo)
Goose?
ymmv, but I think all of this is too much and you generally don't need to think about how to use an AI properly since screaming at it usually works just as well as very fine tuned instructions.
you don't need claude code, gemini-cli or codex I've been doing it raw as a (recent) lazyvim user with a proprietary agent with 3 tools: git, ask and ripgrep and currently gemini 3 is by far the best for me even without all these tricks.
gemini 3 has a very high token density and a significantly larger context than any model that is actually usable, every 'agent' I start shoves 5 things into the context:
- most basic instructions such as: generate git format diff only when editing files and use the git tool to merge it (simplified, it's more structured and deeper than this)
- tree command that respects git ignore
- $(ask "summarize $(git diff)")
- $(ask "compact the readme $(cat README.MD"))
- (ripgrep tools, mcp details, etc)
when the context is too bloated I just tell it to write important new details to README.MD and then start a new agent
https://github.com/kagisearch/ask
I'm doing something very similar but even simpler and Gemini 3 is absolutely crushing it. I tried to do this with other models in the past, but it never really felt productive.
I don't even generate diffs, just full files (though I try and keep them small) and my success rate is probably close to 80% one-shotting very complex coding tasks that would take me days.
I've been using Gemini CLI for months now, mainly because we have a free subscription for it through work.
Tip 1, it consistently ignores my GEMINI.md file, both global and local. Even though it's always saying that "1 GEMINI.md file is being used", probably because the file exists in the right path.
Tip 12, had no idea you could do this, seems like a great tip to me.
Tip 16 was great, thanks. I've been restarting it everytime my environment changes for some reason. Or having it run direnv for me.
All the same warnings about AI apply for Gemini CLI, it hallucinates wildly.
But I have to say gemini cli gave me my first real fun experience using AI. I was a late comer to AI, but what really hooked me was when I gave it permission to freely troubleshoot a k8s PoC cluster I was setting up. Watching it autonomously fetch logs, objects, troubleshoot until it found the error was the closest thing to getting a new toy for christmas for me in many years.
So I've kept using it, but it is frustrating sometimes when AI is behaving so stupid you just /quit and do it yourself.
Thanks for sharing. Gemini CLI doing live troubleshooting for a K8s cluster is surreal. I am keen to try that out, since I have just created RKE2 clusters.
IMHO, one understated downside in today's AI/Agentic/Vibe-coding options is that ALL of them are evolving a bit too fast before any of these types of "best practices" can become a habit with a critical mass of developers, rendering many such tips obsolete very quickly (as another person rightfully pointed out).
Sure, software in general will keep evolving rapidly but the methods and tools to build software need to be relatively more stable. E.g. many languages and frameworks come and go, but how we break down a problem, how we discover and understand codebases, etc. have more or less remained steady (I think).
I see this a paradox and have no idea what the state of equalibrium will look like.
Gemini CLI sucks. Just use Opencode if you have to use Gemini. They need to rebuild the CLI just as OAI did with Codex.
YMMV I guess but it's my goto tool; fast and reliable results at least for my use cases
I'm pretty sure we are in an apple vs android situation, where you give lifetime apple users an android phone, and after a day they report that android is horrid. In reality, they just aren't used to how stuff is done on android.
I think many devs are just in tune with the "nature" of Claude, and run aground easier when trying to use gemini or Chatgpt. This also explains why we get these perplexing mixed signals from different devs.
There are some clear objective signals that aren’t just user preference. I shelled out the $250 for Gemini’s top tier and am profoundly disappointed. I had forgotten that loops were still a thing. I’ve hit this multiple times in Gemini CLI, and in different projects. It gets stuck in a loop (as in the exact same, usually nonsense, message over and over) and the automated loop detection stops the whole operation. It also stops in the middle of an operation very frequently. I don’t hit either of these in Claude Code or Codex.
There certainly is some user preference, but the deal breakers are flat out shortcomings that other tools solved (in AI terms) long ago. I haven’t dealt with agent loops since March with any other tool.
have you received your money's worth in other products?
Agreed. Been using Claude Code daily for the past year and Codex as a fall back when Claude gets stuck. Codex has two problems: it Windows support sucks and it's way to "mission driven" vs the collaborative Claude. Gemini CLI falls somewhere in the middle, has some seriously cool features (Ctrl+X to edit prompt in notepad) and it's web research capability is actually good.
Claude had the same feature for editing the prompt in $EDITOR
Codex prompt editing sucks
I'm constantly floored with how well claude-cli works and gemini-cli stumbled on something simple the first time I used it and Gemini's 3 Pro release availability was just bad.
Well Opencode also completely replaced its TUI a few weeks ago too.
BTW Gemini 3 via Copilot doesn't currently work in Opencode: https://github.com/sst/opencode/issues/4468
Copilot on Opencode is not good. It’s all over the place which is a shame because Copilot is one of the best values.
> To use OpenCode, you’ll need:
> A modern terminal emulator like:
> WezTerm, cross-platform
> Alacritty, cross-platform
> Ghostty, Linux and macOS
> Kitty, Linux and macOS
What's wrong with any terminal? Are those performance gains that important when handling a TUI? :-(
Edit:
Also, I don't see Gemini listed here:
https://opencode.ai/docs/providers/
Only Google Vertex AI (?): https://opencode.ai/docs/providers/#google-vertex-ai
Edit 2:
Ah, Gemini is the model and Google Vertex AI is like AWS Bedrock, it's the Google service actually serving Gemini. I wonder if Gemini can be used from OpenCode when made available through a Google Workspace subscription...
It's silly of them to say you need a "modern terminal emulator", it's wrong and drives people away. I'm using xfce4-terminal.
Gemini 3 via any provider except Copilot should work in Opencode.
what happened with Codex? Did they rebuild it?
I too am curious. My daily driver has been Claude Code CLI since April. I just started using Codex CLI and there are lot of gaps--the most annoying being permissions don't seem to stick. I am so used to plan mode in Claude Code CLI and really miss that in Codex.
Codex CLI switched from a typescript implementation to a Rust based one.
The model needs to be trained to use the harness. Sonnet 4.5 and gpt-5.1-codex-max are "weaker" models in abstract, but you can get much more mileage out of them due to post-training.
> $ time gemini -p "hello world"
> Loaded cached credentials. > Hello world! I am ready for your first command. > gemini -p "hello world" 2.35s user 0.81s system 33% cpu 29.454 total
seeing between 10-80 seconds for responses on hello world. 10-20s of which is for loading the god damn credentials. this thing needs a lot of work.
All these tips and tricks just to get out-coded by some guy rawdogging Copilot in VS Code.
It’s inferior but copilot is even more inferior to it. I used it again recently just to see after cursor And Claude code. It’s laughably bad. Almost like they don’t care.
I am worried that we are diverging with CLI updates across models. I wish we had converged towards a common functionality and behaviour. Instead, we need to build knowledge of model-specific nuances. The cost of choosing a model is high.
My tip: Move away from Google to an LLM that doesn't respond with "There was a problem getting a response" 90% of the time.
Are we getting billed for these? The billing is so very not transparent.
My experience working in FAANG.. Nobody knows
we need a Nate Bargatze skit for these quips
Would be nice to have an official confirmation. Once token get back to the user those are likely already counted.
Sucks when the LLM goes on a rant only to stop because of hardcoded safeguards, or what I encounter often enough with Copilot: it generates some code, notices it's part of existing public code and cancels the entire response. But that still counts towards my usage.
Copilot definitely bills you for all the errors.
Gemini appears to bill random amounts for reasons nobody knows.
[dead]
I had a terrible first impression with Gemini CLI a few months ago when it was released because of the constant 409 errors.
With Gemini 3 release I decided to give it another go, and now the error changed to: "You've reached the daily limit with this model", even though I have an API key with billing set up. It wouldn't let me even try Gemini 3 and even after switching to Gemini 2.5 it would still throw this error after a few messages.
Google might have the best LLMs, but its agentic coding experience leaves a lot to be desired.
I had to make a new API key. My old one got stuck with this error; it's on Google's end. New key resolved immediately.
and then loosing half a day setting up billing - with a limited virtual credit card so you have at least some cost control
For me, I had just set up a project and set billing to that. Making a second key and assigning the billing to that was instant; I got to reuse it.
I have sympathy for any others who did not get so lucky
A lot of times Gemini models will get stuck in a loop of errors in a lot of times it fails to edit/read or other simple function calling
it's really really terrible at agentic stuff
Not so much with Gemini 3 Pro (which came out a few days ago)... to the point that the loop detection that they built into gemini-cli (to fight that) almost always over-detects, thinking that Gemini 3 Pro is looping when it in fact isn't. Haven't had it fail at tool calls either.
Interesting, I run into loop detection in 2.5-pro but haven't seen it get in 3 Pro. Maybe its the type of tasks I throw st it though, I only use 3 at work and the code base is much more mature and well defined than my random side projects.
Tried in V0, it always gets into an infinite loop
will give the CLI another shot
we've gone from 'RTFM' to 'here's 30 tips to babysit your AI assistant' and somehow this is considered progress
Its simple, just follow these 30 tips and tricks :D
Gemini 3 with CLI is relentless if you give it detailed specs and other than API errors, it just is great. I'd still rank Claude models higher but Gemini 3 is good too.
And the GPT-5 Codex has a very somber tone. Responses are very brief.
The problem is that Gemini CLI simply doesn’t work. Beside simplest of tasks like creating new release it is useless as coding assistant. Doesn’t have a plan mode, jumps right into coding and then gets stuck in the middle of spaghetti code.
Gemini models are actually pretty capable but Gemini CLI tooling makes them dumb and useless. Google is simply months behind Anthropic and OpenAI in this space!
Is there a similar guide/document for Claude Code?
>this lets you use Gemini 2.5 Pro for free with generous usage limits
Considering that access is limited to the countries on the list [0], I wonder what motivated their choices, especially since many Balkan countries were left out.
[0]: https://developers.google.com/gemini-code-assist/resources/a...
For Europe it's EU + UK + EFTA plus for some reason, Armenia.
Looking through this, I think a lot of these also apply to Google Antigravity which I assume just uses the same backend as the CLI and just UI wraps a lot of these commands (e.g. checkpointing).
Tips and tricks for playing slot machines
Best practices for gambling
Kinda useful, especially tip 15 and tip 26.
There needs to be a lot more focus on the observability and showing users what is happening underneath the hood (especially wrt costs and context management for non-power users).
A useful feature Cursor has that Antigravity doesn't is the context wheel that increases as you reach the context window limit (but don't get me started on the blackbox that is Cursor pricing).
Gemini-CLI on Termux does not work anymore. Gemini itself found a way to fix the problem, but I did not totally grok what it was going to do. It insisted my Termux was old and rotten.
Make sure you've turned off the "alternate buffer" setting
agentic coding seems like its not the top priority but more at capturing the search engine users which is understandable.
still i had high hopes for gemini 3.0 but was let down by the benchmarks i can barely use it in cli however in ai studio its been pretty valuable but not without quirks and bugs
lately it seems like all the agentic coders like claude, codex are starting to converge and differentiated only by latency and overall cli UX and usage.
i would like to use gemini cli more even grok if it was possible to use it like codex
I love the model, hate the tool. I’ve taken complex stuff and given it to Gemini 3 and been impressed, but Anthropic has the killer app with Claude Code. The interplay of sonnet (a decent model) and the tools and workflow they’ve got with Claude code around it supercharge the outcome. I tried Gemini cli for about 5 seconds and was so frustrated, it’s so stupid at navigation in the codebase it takes 10x as long to do anything or I have to guide it there. I have to supervise it rather than doing something important while Claude works in the background
A lot it seems to mirror syntax of Claude Code
Integration with Google Docs/Spreadsheets/Drive seems interesting but it seems to be via MCP so nothing exclusive/native to Gemini CLI I presume?
There seems to be an awful many "could" and "might" in that part. Given how awfully limited the Gemini integration inside Google Docs is, it's an area that's just made me feel Google is executing really slowly on this.
I've built a document editor that has AI properly integrated - provides feedback in "Track Changes" mode and actually gives good writing advice. If you've been looking for something like this - https://owleditor.com
It looks nice, but for my use it's very specifically not reviews I want in AI integration with an editor, but to be able to prompt it to write or rewrite large sections, or repeated references to specific things, with minimal additional input. I specifically don't want to go through an approve edit by edit - I'll read through a diff and approve all at once or just tell it how to fix its edits.
Claude at least is more than good enough to do this for dry technical writing (I've not tried it for anything more creative), and so I usually end up using Claude Code to do this with markdown files.
Am I stupid? I run /corgi, nothing happens and I don't see a corgi. I have the latest version of the gemini CLI. Or is it just killedbygoogle.com
It would/will be interesting to see this modified to include Antigravity alongside Gemini CLI.
Addy delivers!
Nice breakdown. Curious if you’ve explored arbitration layers or safety-bounded execution paths when chaining multiple agentic calls?
I’m noticing more workflows stressing the need for lightweight governance signals between agents.
How many of these 30 tips can replaced by Tip 8: tell Gemini to read the tips and update its own prompt?
Antigravity obsoleted Gemini CLI, right?
I really tried to get gemini to work properly in Agent mode. Tho it way to often wen't crazy, started rewriting files empty, commenting like "here you could implement the requested function" and many more stuff including running into permanent printing loops of stuff like "I've done that. What's next on the debugger? Okay, I've done that. What's next on the with? Okay, I've done that. What's next on the delete? Okay, I've done that. What's next on the in? Okay, I've done that. What's next on the instanceof? Okay, I've done that. What's next on the typeof? Okay, I've done that. What's next on the void? Okay, I've done that. What's next on the true? Okay, I've done that. What's next on the false? Okay, I've done that. What's next on the null? Okay, I've done that. What's next on the undefined? Okay, I've done that..." which went on for like 1hour (yes i waited to see how long it takes for them to cut it).
Its just really good yet.
I recently tried IntelliJs Junie and i have to say it works rather well.
I mean at the end of the day all of them need a human in the loop and the result is just as good as your prompt, tho with Junie i at least most of the time got something of a result, while with gemini 50% would have been a good rate.
Finally: Still dont see agentic coding for production stages - its just not there yet in terms of quality. For research and fun? Why not.
Why is this AI generated slop so highly upvoted?
Even thought the doc _might_ be AI generated, that repo is Addy Osmani's.
Of Addy Osmani fame.
I seriously doubt he went to Gemini and told it "Give me a list of 30 identifiable issues when agentic coding, and tips to solve them".
Because it's good slop.
I have never had a luck with using Gemini. I had a pretty good app create with CODEX. Due to the hype I thought let me give Gemini a try. I asked it find all way to improve security and architecture / design. sure enough it gave a me a list of components etc that didn’t match best patterns and practices. So I let it refactor the code.
It Fucked up the entire repot. It hard coded tenant ids and used ids, it completely destroyed my UI. Broke my entire grpahql integration. Set me back 2 weeks of work.
I do admit the browse version of Gemini chat does much better job at providing architecture and design guidance time to time.
I can't emphasize this enough, it doesn't matter how good a model is or what CLI I'm using, use git and chroot (at the least, container is easier though).
Always make the agent write a plan first and save it to something like plan.md, and tell it to update the list of finished tasks in status.md as it finishes each task from plan.md and to let you review the change before proceeding to next task.
Do you use AI agents on repos without version control?
> Set me back 2 weeks of work.
How did this happen?
Did you let the agent loose without first creating its own git worktree?
What's the benefit of git worktree? I imagine you can just not give the agent access to git and you're in the same spot?
I'll reply to myself since apparently people downvote and move on:
They're useful for allowing agents to work in parallel. I imagine some people give them access to git and tools and sandbox the agents, then let a bunch of them work in separate git worktrees pointed at the same branch, then they come back and investigate/compare and contrast what the agents have done, to accelerate their work.
I think there is value in that but it also feels like a lot of very draining work and I imagine long term you're no longer in control of the code base. Which, I mean, great if you're working on a huge code base since you already don't control that...
Welcome to engineering management.
The average quality of software engineering is abysmal, and the average quality of software engineering management is even worse. I'm not very enthusiastic about where this is headed.
tfw people are running agents outside containers
Yeah this something I need to get to.
Apologies. I meant branch. I nuked the branch. But set me back a lot of time as I thought it may be few things here and there.