About:

Writings on Software Engineering, Software Architecture, Generative AI and LLMs

Website:

Specializations:

Outgoing Links:

Subscribe to RSS:
The blog post discusses insights from a paper by the Google DeepMind team on the Gemini Embedding model, which achieves state-of-the-art performance across various benchmarks. It highlights the model's use of multi-resolution loss...
The post discusses the development of a regulatory intelligence software focused on extracting obligations from dense PDF documents. It outlines a two-step approach for obligation extraction, emphasizing the importance of breaking...
A comprehensive guide on building a production-ready voice agent platform for IT support, focusing on architecture, best practices, and user interaction strategies.
The author describes a debugging process for a Python HTTP client built with the httpx library that fails with a 403 error when pointed to a production environment. Despite the same request working in other clients like Bruno and ...
The post addresses the misinterpretation of 'now' as 'no' in speech-to-text systems and offers solutions to enhance user experience through better prompts and clarification.
The blog post discusses the author's experience in extracting information from hotel tariff sheet PDF documents using various OpenAI models. It details the process of converting PDFs to images and utilizing the OpenAI API to extra...
The blog post discusses a Microsoft Research paper analyzing the impact of AI on professional work by examining 200,000 anonymized conversations with Microsoft Bing Copilot. It highlights that in 40% of cases, user goals and AI ac...
The author discusses their experience with the coderunner-ui project, a local-first AI workspace that requires macOS on Apple Silicon. After facing compatibility issues on an Intel Mac, the author explores the codebase and collabo...
The blog post discusses the mini-swe-agent, a tool that operates in a continuous loop to solve problems by executing bash commands based on queries to a language model (LLM). It details the agent's core loop, which includes initia...
Cursor, an AI-powered code editor, has revised its pricing model to better reflect actual API costs, moving away from a straightforward request-based system. The old model allowed for a fixed number of requests but did not account...
The blog post discusses the use of the Pydantic MCP Run Python package as an open-source alternative to OpenAI's Code Interpreter for executing Python code in a secure environment. It details the installation of Deno, the setup of...
The blog post discusses how to calculate a realistic accuracy target for large language model (LLM) tasks by evaluating the financial impact of model decisions. It uses a fake news classifier as an example, explaining the costs as...
The blog post discusses the criteria and considerations executives should take into account when selecting vendors for Generative AI (GenAI) solutions. It emphasizes the importance of understanding the specific needs of the organi...
The blog post discusses the testing of Google's Gemma 3 270M, a compact language model designed for efficient AI capabilities. The author, involved in building AI voice agents, explores the model's ability to generate variations o...
The AbsenceBench paper reveals a limitation in large language models (LLMs) in detecting missing information, despite excelling at finding specific information. The research methodology, striking performance results, attention mec...
The article discusses the use of OpenAI's Code Interpreter feature to perform ad-hoc analysis of user-uploaded files, such as Excel, CSV, or JSON files. It explains how the code interpreter tool can be used to process large files,...
The term 'reward hacking' in AI refers to the phenomenon where an AI optimizes an objective function without achieving the intended outcome. This poses a fundamental challenge in AI alignment and reliability. The text discusses ex...
The text discusses the reasons why Claude Code is a CLI tool instead of an IDE, citing the broad range of IDEs used at Anthropic and the belief that models are progressing so fast that IDEs may become obsolete. The author also men...
Mistral released a new model called Devstral, designed for Agentic coding tasks, with a long context window and a 24B parameter model. It outperforms prior open-source models but has high latency and issues with generating valid J...
The article discusses how the author used Claude, an AI pair programmer, to implement a text synchronization feature for their video player at Videocrawl. It highlights the strengths and limitations of AI-assisted development, the...
The author discusses the idea of giving agency to the summarizer to generate dynamic summarization prompts. He shares his approach and provides a concrete example of summarizing the Search R1 paper. The LLM generated a summarizati...
The new videocrawl feature allows users to track their video progress across sessions, with a visual progress bar and automatic resumption from the last position. It is implemented using the browser's localStorage API and lays the...
The author discusses building a screenshot feature for Videocrawl, an AI companion app for videos. The feature aims to allow users to take a screenshot of the current video frame and save it as a note. The author uses LLMs for wri...
The text is a summary of a talk on AI Engineering at Jane Street, discussing how they trained their own model due to the limitations of off-the-shelf large language models with OCaml. They collected training data through workspace...