About:

Nicholas Wilt is a software architect and performance optimization specialist with interests in software engineering and parallel programming.

Website:

Specializations:

Interests:

Software engineering Parallel programming Industry commentary
Subscribe to RSS:
CRTP in C++ allows for efficient polymorphism without performance penalties, making it ideal for optimizing limit order book implementations.
The post discusses the implications of Moore's Law and SRAM scaling in semiconductor technology. It highlights Gordon Moore's original observation about transistor density doubling and the subsequent challenges faced by SRAM in sc...
The article discusses the evolution of NVIDIA's GPU architecture, particularly focusing on the performance improvements of the Volta-enabled DGX-1 systems compared to earlier models. It highlights the increasing disparity between ...
The post discusses the QLoRA method for fine-tuning large language models using a 4-bit NormalFloat (NF4) representation for quantization. It explains how NF4 is derived from the FP32 data type, detailing the conversion process an...
This article is a follow-up to a previous discussion on optimizing the 'Third Largest' algorithm using SIMD (Single Instruction, Multiple Data) techniques. It explores how SIMD can enhance the performance of sorting algorithms, pa...
The author reflects on the challenges of coding interviews, particularly focusing on the problem of finding the k'th largest element in an array. They discuss their experience with interview questions, specifically a Microsoft int...
The article discusses the design and implementation challenges of CUDA 2.0, particularly focusing on the introduction of 3D textures and the complexities of backward compatibility in API design. It highlights the importance of mai...
The post discusses Intel's introduction of MMX instructions to the x86 platform, focusing on the decision to alias new MMX registers with existing x87 registers to minimize disruption for users. It explains the context switching m...
The blog post discusses the history and significance of SIMD (Single Instruction, Multiple Data) instructions in CPU architecture, particularly on the x86 platform. It traces the evolution of CPU designs from SISD to SIMD, highlig...
The essay discusses the role of caches in computer architecture, arguing that they should be viewed as an abstraction rather than merely an optimization. It explores the historical context of cache design, the tension between deve...
The article discusses the concept of 'The Utility of Futility' in software engineering, emphasizing the importance of recognizing limitations and avoiding unnecessary complexity in system design. It illustrates this principle thro...
The article discusses the challenges of error handling in API design, particularly in the context of CUDA. It critiques the flawed design of the C runtime's atoi() function and explores three main error propagation methods: except...
The post discusses the evolution of the CUDA driver API, particularly the transition of context management from a parameter-based approach to using thread-local storage (TLS). It highlights the implications of this change on API d...
The article provides an in-depth introduction to Central Limit Order Books (CLOBs) and their significance in high-frequency trading (HFT). It explains how limit order books function, detailing the processes of submitting, matching...
The article discusses the lack of consensus on error handling in CUDA programming, despite its long existence. It emphasizes the importance of checking error codes and presents a structured approach to error handling based on the ...
The article discusses the complexities and controversies surrounding managed memory in CUDA programming, particularly its performance implications. It contrasts CUDA's flat address space with segmentation, explaining how segmentat...
Qt Group's Axivion static code analyzer enhances coding standards compliance for CUDA, reflecting on the historical significance of code analyzers in safety-critical software development.
The post discusses the importance of writing maintainable code for future developers, emphasizing the role of interface design in enhancing code readability. It highlights the author's experience with the SI_Refresh() function in ...
The post discusses the importance of consistency in decision-making within computer engineering, particularly when faced with multiple plausible options. It highlights the historical context of little-endian and big-endian data st...
The article discusses NVIDIA's shift away from supporting warp synchronous code in CUDA development, highlighting the performance motivations behind its use despite being technically incorrect. It explains the transition from usin...
The post discusses the use of cudaGetLastError() in CUDA programming, emphasizing that while it is often discouraged, it is necessary to check for errors immediately after a kernel launch that may fail due to misconfiguration. The...
The author, a seasoned programmer since 1982, shares a curated list of essential programming books that have significantly influenced his career. He highlights key texts such as 'Introduction to Algorithms,' 'Programming Pearls,' ...
The post examines how pricing strategies for development tools impacted the competition between OS/2 and Windows NT, leading to OS/2's market failure.
The author reviews the most engaging content from their Substack, emphasizing the popularity of SIMD-related articles and the challenges of content creation in a competitive digital landscape.