Nicholas Wilt

About:

Nicholas Wilt is a software architect and performance optimization specialist with interests in software engineering and parallel programming.

Website:

parallelprogrammer.substack.com

Specializations:

Software Architect Performance optimization specialist Industry Commentator

Interests:

Software engineering Parallel programming Industry commentary

Subscribe to RSS:

Link

2025-12-12 • performance software development c++ programming crtc polymorphism

CRTP in C++ allows for efficient polymorphism without performance penalties, making it ideal for optimizing limit order book implementations.

2025-11-26 • technology cpu gpu video ram, sram, system ram moore's law semiconductors and chip technology

The post discusses the implications of Moore's Law and SRAM scaling in semiconductor technology. It highlights Gordon Moore's original observation about transistor density doubling and the subsequent challenges faced by SRAM in sc...

2025-10-27 • architecture machine learning gpu deep learning nvidia voltes v

The article discusses the evolution of NVIDIA's GPU architecture, particularly focusing on the performance improvements of the Volta-enabled DGX-1 systems compared to earlier models. It highlights the increasing disparity between ...

2025-10-19 • machine learning avx performance optimization quantization software engineering

The post discusses the QLoRA method for fine-tuning large language models using a 4-bit NormalFloat (NF4) representation for quantization. It explains how NF4 is derived from the FP32 data type, detailing the conversion process an...

2025-09-30 • performance computer science simd algorithms coding interviews

This article is a follow-up to a previous discussion on optimizing the 'Third Largest' algorithm using SIMD (Single Instruction, Multiple Data) techniques. It explores how SIMD can enhance the performance of sorting algorithms, pa...

2025-09-26 • performance data structures c++ programming algorithms coding interviews

The author reflects on the challenges of coding interviews, particularly focusing on the problem of finding the k'th largest element in an array. They discuss their experience with interview questions, specifically a Microsoft int...

2025-09-16 • api gpu memory management nvidia 3d textures

The article discusses the design and implementation challenges of CUDA 2.0, particularly focusing on the introduction of 3D textures and the complexities of backward compatibility in API design. It highlights the importance of mai...

2025-08-15 • intel operating systems sse4.2 vmx context switching

The post discusses Intel's introduction of MMX instructions to the x86 platform, focusing on the decision to alias new MMX registers with existing x87 registers to minimize disruption for users. It explains the context switching m...

2025-08-13 • architecture intel cpu parallel computing simd

The blog post discusses the history and significance of SIMD (Single Instruction, Multiple Data) instructions in CPU architecture, particularly on the x86 platform. It traces the evolution of CPU designs from SISD to SIMD, highlig...

2025-07-16 • game development computer architecture gpu memory management caching

The essay discusses the role of caches in computer architecture, arguing that they should be viewed as an abstraction rather than merely an optimization. It explores the historical context of cache design, the tension between deve...

2025-07-01 • engineering software development windows computer architecture gpu

The article discusses the concept of 'The Utility of Futility' in software engineering, emphasizing the importance of recognizing limitations and avoiding unnecessary complexity in system design. It illustrates this principle thro...

2025-06-29 • programming software development api error handling gpu

The article discusses the challenges of error handling in API design, particularly in the context of CUDA. It critiques the flawed design of the C runtime's atoi() function and explores three main error propagation methods: except...

2025-06-27 • api gpu thread safety nvidia public service commission

The post discusses the evolution of the CUDA driver API, particularly the transition of context management from a parameter-based approach to using thread-local storage (TLS). It highlights the implications of this change on API d...

2025-11-14 • finance trading strategies high-frequency trading code optimization blob

The article provides an in-depth introduction to Central Limit Order Books (CLOBs) and their significance in high-frequency trading (HFT). It explains how limit order books function, detailing the processes of submitting, matching...

2025-10-13 • programming software development error handling gpu nvidia

The article discusses the lack of consensus on error handling in CUDA programming, despite its long existence. It emphasizes the importance of checking error codes and presents a structured approach to error handling based on the ...

2025-07-16 • performance gpu memory management market segmentation

The article discusses the complexities and controversies surrounding managed memory in CUDA programming, particularly its performance implications. It contrasts CUDA's flat address space with segmentation, explaining how segmentat...

2025-12-17 • coding standards and guidelines safety critical systems axivion static code analyzer

Qt Group's Axivion static code analyzer enhances coding standards compliance for CUDA, reflecting on the historical significance of code analyzers in safety-critical software development.

2025-09-25 • design code quality and management maintainability software engineering refactoring

The post discusses the importance of writing maintainable code for future developers, emphasizing the role of interface design in enhancing code readability. It highlights the author's experience with the SI_Refresh() function in ...

2025-09-12 • api opengl directx computer science and education endianness

The post discusses the importance of consistency in decision-making within computer engineering, particularly when faced with multiple plausible options. It highlights the historical context of little-endian and big-endian data st...

2025-07-16 • programming performance warp gpu nvidia

The article discusses NVIDIA's shift away from supporting warp synchronous code in CUDA development, highlighting the performance motivations behind its use despite being technically incorrect. It explains the transition from usin...

2025-06-30 • programming error handling kernel development gpu asynchronous processing

The post discusses the use of cudaGetLastError() in CUDA programming, emphasizing that while it is often discouraged, it is necessary to check for errors immediately after a kernel launch that may fail due to misconfiguration. The...

2025-11-19 • programming software development computer science algorithms books

The author, a seasoned programmer since 1982, shares a curated list of essential programming books that have significantly influenced his career. He highlights key texts such as 'Introduction to Algorithms,' 'Programming Pearls,' ...

2025-12-17 • technology history development operating systems windows nt 4.0

The post examines how pricing strategies for development tools impacted the competition between OS/2 and Windows NT, leading to OS/2's market failure.

2026-01-01 • technical writing avx simd gpu nvidia

The author reviews the most engaging content from their Substack, emphasizing the popularity of SIMD-related articles and the challenges of content creation in a competitive digital landscape.