About:

Emir is a researcher and software engineer with interests in maths, stats, and astronomy, currently pursuing a PhD.

Website:

Specializations:

Interests:

Maths Stats Computer science Astronomy Analytical philosophy Applied maths

Outgoing Links:

Gwern Branwen
Subscribe to RSS:
The post discusses a scheme for estimating probabilities of subsets of binary variables using Dempster-Shafer theory, which allows for the assignment of probabilities directly to subsets of events. It contrasts this with Bayesian ...
Differentiable memory leverages linear algebra to create a compressible key-value store, enhancing deep learning's attention mechanism for various applications.
An unsupervised query tagger is developed using evidence theory to enhance query understanding by tagging user queries with relevant labels based on OpenStreetMap data.
A statistical analysis of Snakes & Ladders reveals the expected number of turns to finish the game using Markov theory and transition probabilities.
The blog post discusses methods for sorting fractions under uncertainty, focusing on the binomial distribution and confidence intervals for estimating the fraction of successful trials. It presents two approaches: a Bayesian metho...
The post presents the pyevidence repository, offering tools to implement evidence theory while addressing its computational challenges.
The text presents a weak supervision paradigm called 'data programming' which uses maximum likelihood estimation to produce soft labels from heuristics. It includes a simple example to show that the methods work and discusses the ...
The author uses 500 Hacker News titles and an LLM to derive an article ranking model from a user supplied preference description. The LLM supplies the labelled data, whilst Ridge regression and cheap sentence transformer embedding...
The post explains the Kelly criterion and how to derive it, as well as a simple way to extend it to simultaneous independent binary bets. It also discusses the multiple simultaneous bets and the Python function to achieve it.
The text discusses the use of RBF kernel approximation with random Fourier features in machine learning. It explains the problems with linear methods and how kernel regression can address these issues. It also introduces the rando...
The text discusses linear metric learning and how to find a transformation A which makes the sum of squared difference between i and j similar, regardless of whether its calculated in terms of x or y. It also explains how to appro...
The text discusses the 'Billion Row Challenge' in Fortran, which involves processing 1 billion rows of weather station data to obtain min/max/mean for each station as quickly as possible. The author documents their journey from a ...
The author discusses their experience solving Advent of Code puzzles using Prolog, Haskell, Python, and Scala. They compare the ease of coding in each language, noting that Prolog was the most difficult but also the most mind-expa...
The text introduces a novel logic puzzle called 'Domicles' using Dominoe tiles. The author explains the rules of the game, provides examples, and presents a Prolog implementation. The difficulty of the puzzles is discussed, and a ...
The text discusses a minimal proof-of-concept for a stochastic simulator in Prolog via a meta-interpreter. It explains the implementation, syntax, semantics, and conclusions of the interpreter, as well as provides examples and sim...
The author discusses the use of logic programming for data analysis, specifically analyzing diamond prices using a symbolic approach. The post covers data preparation, domain knowledge, consistency and coverage checking, and price...
The text is about the construction of a Block Dominoe playing algorithm for a hidden information variant of the game. The author built a game simulator, learned from a heuristic algorithm, and developed some play-out based algorit...
The text analyzes the data job market using 'Ask HN: Who is hiring?' posts from 2013 to the present. It suggests that the Data Scientist role is in decline and that skills such as data mining and visualization are also out of favo...
The text discusses a riddle about drawing playing cards and the optimal stopping rule to maximize expected payoff. The author shares their thought process and the statistical approach they used to solve the problem.
The author discusses the use of a mark and recapture experiment to estimate the total number of gym members based on the number of people repeatedly seen at the gym. They explore the Lincoln-Petersen estimator and Poisson regressi...
The text explains the methods of blocking, optimal design, and covariate adjustment to improve the power of experiments. It emphasizes the importance of these methods for data scientists working with online experiments, and provid...
The text discusses the use of logic programming for clustering, emphasizing its suitability for general commercial use cases. It presents artisanal clustering algorithms in Prolog demonstrated on mock data and explains how domain ...
The text discusses the integration of Prolog as a critical component in data science analysis, using analytic methods to generate properties about the data and Prolog to reason about the data via the generated properties. It inclu...
The author discusses the challenges of working with large SQL codebases and the need for a composable SQL. They explore Logica and the use of M4 macro pre-processor to create shared libraries and abstract common parts of SQL queri...