I spent more than half my day yesterday telling Claude to correct itself because it did things I explicitly told it not to do in my prompt.
“You’re right - I overstepped”
Is the new “You’re absolutely right”.
I don’t know if we can qualify something that actively goes against the explicit instructions you give it as “something great”. It just sounds like Dario is building snake oil and selling it too.
I have a script at work that writes out some config files and I'm having Claude run them after making changes. The script if it detects breaking changes will spit out a message saying what the breaking changes are, and not do anything, telling you to rerun it after validation with the override flag.
If I don't tell Claude about this behavior, it ignores the script output and lies about passing tests that validate if the config files were regenerated.
So I added to my prompt instructions to observe it, and if it sees that message, double check its work and then inform me and ask what to do before proceeding.
This has had the net result of Claude either running the script with the override flag from the get go (explicitly forbidden) or it seeing the message and convincing itself that the override is warranted and running it a second time with the override flag. It's never once stopped to ask me what to do like instructed.
This is one of a few reason I strongly prefer GPT and its codex variants. It seldom frustrates me, sure its not omnipotent in any way, but it just feels very "tuned in" when it comes to understanding intent and scope.
Imagine worker that did loop of "you're absolutely right -> same fuckup again" multiple days every week, wasting time of whoever told them to do the task
I do want to fire Claude at this point and switch to Codex. Unfortunately the guy with the purse strings is ride or die full Claude psychosis and our business can’t afford to just buy anything and everything for funsies.
That depends on the company. I worked at an S&P 500 company that muddled along like this. They still make critical software for local and state governments.
Something changed with Dario a year or so ago. I think he started out with good intentions, although really hard to tell. Maybe it was really all about power and control for him from day one. Certainly now he's a different person - appears totally corrupted by money and power.
Dario used to at least emphasize the potential positives of AI while being worried about the negatives, but unlike Hassabis/DeepMind he has done nothing to bring about the positive part and is now just accelerating the harmful part as fast as he can. Google is an AI company, bringing us things like AlphaFold, and Anthropic (also OpenAI) are just LLM companies.
It's just the worst version of capitalist game theory. If I don't do the bad thing and get rich, then someone else will do the bad thing and I won't get rich.
But this new tool is not a blacksmith’s tool in the traditional sense. It’s more like an automated blacksmith that works fast, for cheap, does mediocre work, but has this mediocre skill level in an exceptional broad range of tasks.
Why not? Blacksmithing and coding have a hell of a lot in common. In both disciplines toolmaking is extremely important. Often you have to make custom tools to accomplish a design--e.g. a twisting wrench or a form tool. Sometimes you have to make tools that get used once and thrown away, like a jig temporarily welded to a piece to hold it in place while you build its sibling assembly. Sound familiar? I do this kind of thing all the time in code.
Another similarity is the relative simplicity of the underlying structure of the system. You essentially have two hammers (one small one you swing with your hand and another big one that is planted on the ground), some material, and some heat. You build the rest.
Another similarity is the resistance to automation. A skilled blacksmith is a versatile worker. You can create assembly lines to automate any one thing they might produce. The end product will not have the same quality--it will not truly be wrought iron, each piece will not be unique, there will be nothing of the aesthetic taste of the artist in it, but if you're just some bean counter who doesn't care about those things you'll be able to sell it. But if you need the optionality to produce any of those things.. automation is not your friend. And some things just cannot be automated, at least not without extreme costs or very poor results--shoeing horses comes to mind.
If you have Bitwarden installed on an iPhone, you can export directly to Apple Passwords with no intermediate steps or trying to figure out where to save the unencrypted CSV file. I just did this and it looks pretty good so far.
You should look into how often people are using tools like WisprFlow and SuperWhisper. Voice is a very native mechanism. Most people working in open floor plans are wearing headphones any way. As long as you're not screaming, it's probably fine. Maybe, we'll move away from open plan offices in the bid for efficiency, which I would welcome.
I am moving full remote because dictation is such a better input mechanism for most of my AI interactions that I have become less efficient sitting in my open floorplan desk at the office because I cannot dictate there and the latency adds up. Typing is just achingly slow these days.
I also feel this way, but more importantly, I feel like my sentences are more coherent when typed because typing allows for corrections and modifications of ideas. Do whispr people just … get coherent, finalized ideas out in a single shot without any misspoken words?
It's like a hidden curse of LLMs -- they're so good at parsing intended meaning from non-grammatically-correct language that we don't have to be very good at clear communication.
Eventually all LLMs will be controlled by humans uttering terse gutteral grunts. We will all become neanderthals, with machines that deliver our every whim.
transcription gets post-processed by a LLM (with different styles, like based on prompts, so that it removes fillers, homophones, change the style, etc.
I recommend the youtube channel @afadingthought to see what people come up with (like v=283-z29TXeM).
You should look into how often people are using rectangles with buttons on them. They may be a bit archaic, but they are my preferred input method. For example, thanks to rectangles with buttons, the other people in my vicinity do not need to hear about the inane internet arguments I routinely involve myself in.
I dunno how I can express this best, but I found out a very long time ago that my problem with voice input wasn't that it wasn't good enough. My problem with voice input is that I don't want it. I am very happy for people who use these tools that they exist. I will not be them. Yes I am sure.
And yes, I know SuperWhisper can run offline, but it is a notable benefit that versus many modern speech recognition tools my keyboard does not require an always-active Internet connection, a subscription payment, or several teraflops of compute power.
I am not a flat-out luddite. I do use LLMs in some capacity, for whatever it is worth. Ethical issues or not, they are useful and probably here to stay. But my God, there are so many ways in which I am very happy to be "left behind".
Don't know if you're making a joke, but call center workers using a phone is not the same thing as a call center worker doing all their work on a phone. Worked in a call center for 4 years, one thing everyone needed after their shift was to just STFU for a few hours to decompress.
Here’s my optimistic take - Google is already supplying Gemini/Gemma models for the next generation of Apple Intelligence. It makes complete sense for them to enter the hardware market.
I’d be happier if they use more on device models by optimizing their hardware for the next generation of Gemmma models.
Are my 70 year old parents regular people? They've never had tech jobs, and they figured out how to use AI once I installed ChatGPT on their phone. They provide it pictures, talk to it, and also use text input.
Are the majority of people who don't like / don't use AI not regular people? Definitionally, they are, more so than your parents. Funny how you try to make a general statement but immediately fall back to anecdotes when pressed.
That might be an absurd comparison, but we can fix that.
If you were being charged per character, or running down character limits, and printing on printers that were shared and had economic costs for stalled and started print runs, then:
You wouldn’t “need” to understand. The prints would complete regardless. But you might want to. Personal preference.
>If you were being charged per character, or running down character limits, and printing on printers that were shared and had economic costs for stalled and started print runs,
and the system was being run by some of the planet’s brightest people whose famous creation is well known to disseminate complex information succinctly,
>then:
You would expect to be led to understand, like… a 1997 Prius.
“This feature showed the vehicle operation regarding the interplay between gasoline engine, battery pack, and electric motors and could also show a bar-graph of fuel economy results.” https://en.wikipedia.org/wiki/Toyota_Prius_(XW10)
Wait what? You don’t use the model to investigate new areas of the code you are unfamiliar with, because you can’t trust the model? How freaking bad is Gemini and internal tooling at Google?
With Claude code, or codex, I am able to build enough of an understanding of dependencies like the front end, or data jobs, that I can make meaningful contributions that are worth a review from another human (code review). You have up obviously explore the code, one prompt isn’t enough, but limiting yourself is an odd choice.
The lack of trust isn't because of its abilities. The lack of trust is because OpenAI publicly suggested publicly about licensing our code bases. They hinted at a rug pull along the lines of "if you use our generated code, we would like to get a % of revenue you make from it"
As for Claude - as mentioned I do use it. But, I remember they use your code for training their models. I am not ok with this. We just have different priorities.
Prices have gone skyhigh since lockdown. In fact, it's funny seeing the British media going on about "Cost of Living crisis" all the time, but failing to acknowledge one of the most obvious causes of it.
I predicted a massive price hike way back in the summer of 2020, because somehow we were going to have to pay for the lockdown, and many people didn't believe me. Now it's here, people are trying to tell me it was too long ago, even though economics can run in ten or twenty year cycles.
That's going to depend heavily on where you live in Seattle or NYC. London has some of the most expensive real estate in the world and I can say from experience that you get much more of an apartment for your money in NYC or Seattle than in London (lived in all three).
Having great tools means more impressive solutions, not fewer blacksmiths.
reply