About:

Concise, curated insights on AI Engineering, LLMs and Agentic Architectures by Anup Jadhav. The AI Engineering Brief is a Substack publication with hundreds of subscribers.

Website:

Specializations:

Subscribe to RSS:
Regression testing is crucial for LLM systems to ensure prompt changes do not lead to unnoticed quality drops, advocating for a systematic evaluation approach.
Evaluation-Driven Development (EDD) offers a systematic approach to testing LLM systems, addressing the shortcomings of traditional software testing methods by focusing on quality criteria and iterative evaluation.
A two-layer architecture combining Temporal and LangGraph enhances multi-agent system reliability and performance by separating orchestration from agent logic.
The article discusses the limitations of naive Retrieval-Augmented Generation (RAG) systems when deployed in production environments, particularly under high traffic conditions. It outlines the four-step process of naive RAG and i...
This guide provides a comprehensive understanding of transformers by tracing the historical development of neural network architectures, including RNNs, LSTMs, and CNNs, leading to the creation of transformers. It explains key con...
A golden dataset is essential for evaluating LLM responses, defining correctness through input/output pairs rather than exact string matching.
Using AI coding assistants may hinder developers' learning, as reliance on AI leads to poorer comprehension and debugging skills compared to traditional learning methods.
The AI industry in 2025 showcases significant breakthroughs but is plagued by contradictions in economics, job creation, environmental impact, and transparency.
The post argues against 'Vibe Coding' and promotes 'Context Engineering' to ensure reliable software development with AI, emphasizing structured workflows over casual interactions.
The blog post discusses the paper 'Generative UI: LLMs are Effective UI Generators' from Google Research, which argues that large language models (LLMs) should generate complete user interfaces instead of just text responses. The ...
The post provides a guide on evaluation techniques for LLM systems, emphasizing cost-effective methods to ensure quality output.
The article discusses the limitations of using Postgres for storing vectors, as highlighted by Alex Jacobs in 'The Case Against pgvector.' It emphasizes that while it may seem convenient to use a single database for vector storage...
Stripe's 'minions' system highlights the critical role of structured constraints and human oversight in managing AI-generated code for improved reliability.
The Claude Code team emphasizes that evolving AI agent tools requires simplicity and adaptability, often benefiting from fewer, more expressive tools rather than adding complexity.
Davis Haupt's Markov language aims to optimize programming for machine fluency, potentially enhancing human readability and addressing the trade-off between ease of writing and comprehension.
Garry Tan's /plan-exit-review command for Claude Code ensures thorough self-review of coding plans, enhancing quality and reducing scope creep.
A study on Cursor, an AI coding assistant, reveals that junior developers utilize autocomplete features more, while senior developers prefer the Cursor Agent for delegating tasks. The research indicates that senior developers, wit...
The article discusses the shift in the role of coders towards becoming orchestrators in the context of AI and agentic coding. It emphasizes the importance of human judgment in the orchestration process, highlighting the need for c...
The article discusses an experiment where six large language models (LLMs) were given $10,000 each to trade perpetual futures autonomously, revealing distinct trading behaviors despite identical conditions. Notable patterns emerge...
Many companies falsely claim to be AI-powered by merely using existing tools, while true value creation requires a strategic mindset shift and original development.
AI-assisted coding advice often contradicts itself, reflecting diverse user experiences and the need for personal adaptation and understanding of coding fundamentals.
Anthropic and OpenAI's contrasting fast inference strategies highlight that speed may be less important than accuracy in AI performance.
The term 'Deep Blue' encapsulates the existential dread software engineers face as AI threatens their profession, highlighting a unique tension between identity and productivity.
Boris Cherny provides essential tips for using Claude Code, highlighting its adaptability and the advantage experienced developers have in utilizing AI tools effectively.