Expectations for Agentic Coding Tools: Testing Gemini CLI

By David Eastman Loraine Lawson

Expectations for Agentic Coding Tools: Testing Gemini CLI

Before I launch Gemini CLI, Google鈥檚 open source AI terminal app, let鈥檚 look at what the 鈥渜uality-of-life鈥 expectations are for agentic applications. Now that we have several of these tools 鈥 Claude Code, Warp and OpenAI Codex are other examples 鈥 we have a better sense of what a developer needs from them.

Firstly, it needs to be easy to get started on the command line in your terminal. Developers are still the primary target audience for agentic apps, so environment variables or flags for options are fine. But getting straight in is vital.

For example, connecting your API key to your account can be done via environment variable or up in a web page console. Knowing when you are running out of tokens (whether freely given or paid for) is now an important gauge.

When we hit the start button, we need a simple session intro summary so that we know at least the following things:

The model in use;
The project directory;
Any other pertinent permission or account information, or if a working file is being watched.

A working file in the project directory where assumptions based on the project are written and can be tracked (like the Claude.md file) is an important innovation to move beyond a session life cycle into a project life cycle.

Permission boundaries have to be respected; and in general we are in the early days regarding when to allow the large language model (LLM) to change files, and where. I鈥檝e argued that forcing vibe coders to use git is a bit malign 鈥 but then again, if you fail to plan you are clearly planning to fail.

Showing us an execution plan the LLM will follow to fulfill your request feels good, but has not yet proven to be essential. But unless this is done, the exact tactics an LLM will use are opaque. A simple checkbox list will suffice.

A quit session summary showing time, requests and tokens used is great. Full accounts can really only be tracked on a user page.

There are plenty of other features that will creep into the above list, but we need to be aware of backsliding as well as genuinely useful innovations.

Starting up Gemini

As with all cloud based LLMs, we must show our fealty before we get access to the precious tokens. Go to Google Studio to generate a key. Currently you are given 100 requests a day (check the other tier limits here).

We can install Gemini via npm at the terminal:

Next, set your API key as an environment variable 鈥 I鈥檓 doing it here in the command line on my MacBook:

Then type the command gemini and we are off:

As I mentioned in the quality-of-life section above, this does the important thing of pointing at the active model (Gemini-2.5 Pro in this case) as well as reflecting the project directory.

The theme selection screen disappears as soon as you press return, but I assume you can bring it back. It takes up quite a lot of space on the introduction screen.

Like Claude Code, there is markdown file 鈥 GEMINI.md in this case 鈥 for request customization. I won鈥檛 use it in this post.

What does 鈥渘o sandbox鈥 mean? The bad news is that Gemini starts off with no restrictions as to where your AI may roam. I鈥檓 afraid that isn鈥檛 very sensible, but Gemini gives you fairly straightforward options. The good news is that we can use macOS Seatbelt, which starts off with a sensible policy of restricting access to within the project directory.

So I鈥檒l exit this session (type /quit) and we can restart with this basic security.

The quit screen provides some of the stats I referred to earlier:

We can use Seatbelt by just setting an environment variable in this session, then adding a flag:

Now we are good to go, as we have our seatbelt on.

As I did with Codex in a recent post, let鈥檚 try out the merge of two JSON files. As before, I鈥檓 looking for how the structure supports me, as much as the outcome. If you don鈥檛 want to read the previous post, imagine I have a city website that uses JSON data. I have a JSON file called original_cities.json:

The spelling errors and formatting error (extra comma) are intentional; we want to see if we can bait the LLM.

I also have another file, called updated_cities.json:

I want to update the first file with the contents of the second. This simulates slightly out-of-synch working. I have one condition: I want any updated image references (that I may not have yet) copied into a key called 鈥渋mageintended鈥 so that I don鈥檛 use the data and cause a crash.

Essentially all the merge should do is add the Rome entry to the first file and introduce the new image references without overwriting the existing image key.

So my project folder looks like this. Note, I haven鈥檛 created a GEMINI.md file:

I鈥檒l use the same request I gave to Codex:

鈥減lease update the JSON file original_cities.json with the contents of the file updated_cities.json but if the 鈥榠mage鈥 field is different, please update or write a new 鈥榠mageintended鈥 field with the new value instead鈥

So let鈥檚 see what it does. This task may look specific, but is actually a bit vague, which reflects a request from the average human.

After getting confused about its project file, it gave me a perfectly good answer:

Updating text, adding the new entry and not overwriting any values in the 鈥渋mage鈥 key 鈥 all done. It didn鈥檛 try to fix inconsequential spelling and didn鈥檛 get confused by the trailing comma. It was far quicker than Codex as well.

I checked the file, and indeed the changes were made. Before it answered, it didn鈥檛 quite make a plan, but gave me a fairly basic explanation of what it would do:

As the outcome was entirely correct, the process didn鈥檛 really matter. But only by checking intentions can you really correct LLM 鈥渢hinking鈥 when it takes the wrong path.

I鈥檒l exit to show the final expenditure summary:

As I said, this isn鈥檛 a direct LLM comparison, but Gemini gave me an efficient agentic experience. I鈥檓 sure Google can plug in any of the missing quality-of-life issues I mentioned (specifically, some running stats on token usage), but it is definitely ready for action right now. There is a growing coterie of agentic terminal applications out there for developers to try, and Gemini CLI is a solid addition to that list.

Read More…