About:

Daniel Kang is an assistant professor and the author of a personal Substack blog.

Website:

Specializations:

Outgoing Links:

Simon Willison
Subscribe to RSS:
The post discusses the limitations of large language models (LLMs) in accessing real-time and private information, leading to hallucinations and harmful outputs. It introduces SafeSearch, a safety alignment framework for search ag...
The post introduces DRAMA, a new paradigm for AI agents that integrates data collection, transformation, and analysis into a unified workflow. It addresses the challenges faced by analysts in the social sciences who deal with dyna...
This blog post discusses the enhancements made to CVE-Bench, an AI agent benchmark designed to evaluate the capabilities of AI agents in exploiting web security vulnerabilities. The authors, Yuxuan Zhu, Antony Kellermann, and Dani...
The blog post discusses ZKTorch, an open-source framework for zero-knowledge machine learning (ZKML) that allows AI model owners to generate cryptographic proofs for their outputs without revealing proprietary data. It highlights ...
The post discusses the limitations of SWE-bench Verified, a benchmark for evaluating AI coding capabilities, highlighting that even expert-curated unit tests can miss critical edge cases. The authors introduce UTBoost, a novel tec...
The post discusses the inadequacies of current AI agent benchmarks, highlighting their complexity and unreliability. It critiques benchmarks like WebArena for misjudging AI capabilities and presents a checklist (AI agent Benchmark...
The blog introduces PilotDB, an online Approximate Query Processing (AQP) system designed to address common challenges in production adoption of AQP, such as the need for DBMS modifications, continuous maintenance of offline compu...
The blog post discusses the increasing adoption of Extract-Load-Transform (ELT) pipelines by data engineers and the challenges they face in handling complex data formats and transformation queries. It introduces ELT-Bench, a new b...
The post discusses a critical vulnerability in Twitter's web application that affected 5.5 million users and explores the emerging threat of AI agents autonomously exploiting security vulnerabilities in web applications. It introd...
The post discusses the risks of Indirect Prompt Injection (IPI) attacks on AI-powered personal finance assistants and other AI agents. It highlights how attackers can embed malicious instructions within seemingly harmless external...
Google DeepMind's AlphaEvolve has achieved significant advancements, including a more efficient 4x4 matrix multiplication algorithm and a hexagonal packing algorithm, resulting in a 23% speedup in Gemini training kernels. This ind...
The post discusses the effectiveness of Reinforcement Post Training (RPT) in large language models (LLMs) across various domains, revealing that while RPT significantly improves performance within training domains (like math and c...
The blog post discusses the training of spiky superhuman AI (SSAI), particularly focusing on Google's AlphaEvolve system and its use of reinforcement learning (RL). It explains how RL allows AI to explore environments and achieve ...
The blog post discusses the critical role of post-training techniques, particularly supervised fine-tuning and reinforcement learning, in the advancement of large language models (LLMs). It highlights the increasing costs associat...
REPRO-Bench continues to challenge AI agents' ability to assess reproducibility in social science research, revealing significant limitations even with advanced models like GPT-5.2.
CVE-Bench introduces a leaderboard to evaluate AI agents' cybersecurity capabilities, aiming to enhance penetration testing and address misuse risks in real-world applications.
The post critiques the claim made by podcaster Dwarkesh that reinforcement learning (RL) provides only 1 bit of information per rollout. It argues that this assertion is incorrect by using AIME as an example, demonstrating that th...
The blog post discusses the challenges of research reproducibility in social sciences, highlighting significant shortcomings in current methods. It introduces REPRO-Bench, a benchmark designed to evaluate AI agents' ability to ass...
The post discusses the potential dangers of household humanoid robots, particularly focusing on a research project called BEAT that demonstrates how visual triggers can be used to implant backdoors in vision-driven, multimodal lar...
The blog post discusses LEAP, an LLM-powered library designed to assist social scientists in analyzing unstructured data, such as social media texts. It highlights the challenges of traditional data analysis methods, which are cos...
The blog post discusses the recent release of OpenAI's real-time voice API and its potential for misuse in executing phone-based scams. The authors conducted research demonstrating that voice-enabled AI agents can autonomously per...
The post discusses the current state of autonomous AI software engineers, highlighting their limited effectiveness and accessibility. Despite the hype, these AI engineers are not yet functional in real-world applications, with ben...
The blog post discusses the capabilities and limitations of AI in software engineering, particularly in converting Python code to C++ for performance improvements. It highlights the challenges faced when using AI for complex codin...

0Coming soon

2024-09-03

...