https://twitch.tv/ThePrimeagen - I Stream on Twitch

https://twitter.com/terminaldotshop - Want to order coffee over SSH?
ssh terminal.shop

Become Backend Dev: https://boot.dev/prime
(plus i make courses for them)

This is also the best way to support me is to support yourself becoming a better backend engineer.  

Great News?  Want me to research and create video????: https://www.reddit.com/r/ThePrimeagen

Kinesis Advantage 360: https://bit.ly/Prime-Kinesis

Primeagen's resource offers insights, tutorials, and resources for software developers and technology enthusiasts. Readers can learn about productivity hacks, career development strategies, and personal growth. With articles, videos, and practical advice, Primeagen provides  guidance and expertise for achieving professional success and fulfillment.

ThePrimeTime

A benchmark called 'BS Bench' tests LLMs by asking them nonsense questions where the premise is logically incoherent (e.g., relating fire safety codes to curry recipes). Claude models generally refuse to answer such questions, while OpenAI and Google models tend to confidently fabricate detailed answers. Gemini 2.5 (nicknamed 'Kimmy K') surprisingly outperforms OpenAI and Google on pushback. The deeper concern raised is that LLMs act as skill multipliers — meaning engineers with poor judgment who use AI confidently will make bad decisions faster and at greater scale. The real danger isn't obviously nonsense questions but subtly flawed ones that AI answers without pushback.

Measuring LLM Lies

LLM’s biggest problem is its overconfidence. The developers make it so that LLMS tries to give a confident answer even without finding any information.

This study from back in 2024 was prescient: <a href="https://link.springer.com/article/10.1007/s10676-024-09775-5" target="_blank" rel="noopener nofollow">ChatGPT is bullshit</a>
The more we can focus on these tools as a <a href="https://cheewebdevelopment.com/dont-vibe-code-delegate-responsible-development-with-llms/" target="_blank" rel="noopener nofollow">delegation layer</a>, the better off we’ll be when it comes to managing their constant stream of BS.

<a href="https://medium.com/@ktg.one/all-your-agent-skills-are-broken-8cab4770ccb6" target="_blank" rel="noopener nofollow">https://medium.com/@ktg.one/all-your-agent-skills-are-broken-8cab4770ccb6</a>
I’m mapping it and will need help passing out the survey soon