Bundled by Ludwig ABAP

📥  Last added

Saved in the past week and not yet archived (default view)

What's included

Leslie Lamport

Indices and tables


A Guide to Undefined Behavior in C and C++, Part 1

I've spent the past ~2 weeks building a GPU from...

Bare Bones

(4928) MIT 6.004 Computation Structures, Spring 2017 - YouTube

The Graphics Codex

[2305.13009] Textually Pretrained Speech Language Models

Notes on partial borrows

Dioxus Labs + “High-level Rust”

Compile-Time Configuration For Zig Libraries


Zig's HashMap - Part 1

Zig Parser

Copying Better: How To Acquire The Tacit Knowledge of Experts

(4895) IronBeetle with matklad - YouTube

Causal ordering

Assorted thoughts on zig (and rust)

Columnar kernels in go?

An opinionated map of incremental and streaming systems

Internal consistency in streaming systems

Pain we forgot

Have you tried rubbing a database on it?

The shape of data

Reflections on a decade of coding

Prospecting for Hash Functions

The Missing Zig Polymorphism / Runtime Dispatch Reference


How To Become A Hacker

the rr debugging experience

Text Buffer Reimplementation

What Is The Minimal Set Of Optimizations Needed For Zero-Cost Abstraction?

Using ASCII waveforms to test hardware designs

Rust 2019 and beyond: limits to (some) growth.

Your ABI is Probably Wrong

Napkin Math

Don't write bugs

technicalities: "not rocket science" (the story of monotone and bors)

Why is Python slow

Design duality and the expression problem

Random Thoughts On Rust: And IDEs

John Carmack on Inlined Code

A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World

What is Systems Programming, Really?

MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding

Mitchell Hashimoto


UB Might Be a Wrong Term for Newer Languages Apr 2, 2023

What Every C Programmer Should Know About Undefined Behavior #1/3

The Rustonomicon

chrono-Compatible Low-Level Date Algorithms

Step-by-Step Diffusion: An Elementary Tutorial

Teach Yourself Programming in Ten Years

So Many New Systems Programming Languages II


From Theory To Implementation

Speech-to-text models

Ray Tracing in One Weekend

Untangling Lifetimes: The Arena Allocator

Tree-Structured Concurrency — 2023-07-01

immersivemath: Immersive Linear Algebra

BSTJ 57: 6. July-August 1978: The UNIX Time-Sharing System. (Ritchie, D.M.; Thompson, K.)

Principles of compiler design

A Mathematical Theory of Communication

Mapping the whole internet with Hilbert curves

Hausdorff dimension - Wikipedia


A Recipe for Training Neural Networks

You own your data, in spite of the cloud

Writing CUDA Kernels for PyTorch

Multi-Query & Grouped-Query Attention

999 crates of Rust on the wall


Arithmetic functions

An interactive study of queueing strategies

A DSL for Implementing Math Functions



Accidentally Turing-Complete

The Art of Computer Programming, Vol. 4 Fascicle 6


Exploring architectures- Transformers II

What are Diffusion Models?

Problems with BQN

Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model

The borrow checker within

How should I read type system notation?

Writing a Simple Garbage Collector in C

A decade of developing a programming language

The Rust I Wanted Had No Future

The Garbage Collection Handbook

A high-bias, low-variance introduction to Machine Learning for physicists

How diffusion models work: the math from scratch




A Distributed Systems Reading List

An Introduction to Assembly Programming with RISC-V

Microsoft PowerPoint - SRAM Architecture

MLIR: A Compiler Infrastructure for the End of Moore's Law

MLIR — Getting Started

Chapter 2 Basics of SIMD Programming

Matrix multiplication in Mojo

Matrix Multiplication on CPU

How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog

The Annotated Transformer

Anonymity and the internet

Auto-Regressive Next-Token Predictors are Universal Learners

Where Vim Came From

Building and operating a pretty big storage system called S3

Unnamed Document

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study


New Scaling Laws for Large Language Models

Binary Magic: Building BitNet 1.58bit Using PyTorch from Scratch

king - man + woman is queen; but why?

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

1-bit Model

Human Knowledge Compression Contest

Heatmaps and CNNs Using

Where do LLMs spend their FLOPS?

The Annotated Diffusion Model

Defusing Diffusion Models

The Illustrated Stable Diffusion

Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

Sparse Autoencoders Find Highly Interpretable Features in Language Models

KAN: Kolmogorov–Arnold Networks

KAN: Kolmogorov-Arnold Networks

Structure and Interpretation of Computer Programs, 2nd ed.

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

þÿAn Infinitely Large Napkin

IEEE Xplore Full-Text PDF:

Root Mean Square Layer Normalization

Root Mean Square Layer Normalization

Terry A. Davis

Pattern Recognition and Machine Learning

Ludwig Wittgenstein: The Duty of Genius

Generative Agents: Interactive Simulacra of Human Behavior

Three Decades of Activations: A Comprehensive Survey of 400 Activation Functions for Neural Networks

Revisiting Deep Learning as a Non-Equilibrium Process

Dissipative Adaptation: The Origins of Life and Deep Learning

A Gentle Introduction to LLVM IR


The Art of Embeddings: Transforming Text for Vector Databases (Part 2)

AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)

Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

(222) Intro to Deep Learning and Generative Models Course - YouTube

þÿThe Little Book of Deep Learning


Sequence to Sequence Learning with Neural Networks

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Latent Interfaces


New paper on whether LLMs think in English (Wendler et...

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

How Netflix Really Uses Java

Scheduling Internals

Glossary of Deep Learning: Word Embedding


How to Use t-SNE Effectively

Temperature as Joules per Bit

Consciousness, Cognition and the Neuronal Cytoskeleton – A New Paradigm Needed in Neuroscience

OpenMEA: Open-Source Microelectrode Array Platform for Bioelectronic Interfacing

Landauer's principle

Bremermann's limit

Bekenstein bound


Temperature as Joules per Bit

Deep Learning Course

Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories


Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories


A New Physics Theory of Life

K-Level Reasoning with Large Language Models

Competitive Programmer's Handbook

Writing an OS in Rust

Ever wanted to make your own programming language or wondered how they are designed and built?



Measuring Faithfulness in Chain-of-Thought Reasoning

Links :


It took me 5 years to master all 24 of...

Images are super redundant

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Turing-1951 Intelligent Machinery-a Heretical Theory

Self-Rewarding Language Models

Software Development Trends 2023/2024 - Vol. 2.

Word2vec from Scratch

MemGPT: Towards LLMs as Operating Systems

Visual Guides to understand the basics of Large Language Models

Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs

Thinking in Systems: International Bestseller: Donella H. Meadows, Diana Wright: 9781603580557: Books

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

This project is about how to systematically persuade LLMs to jailbreak them.

Pruning vs Quantization: Which is Better?


Mixtral of Experts

Paper page - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

Discovering Language Model Behaviors with Model-Written Evaluations

Getting Started with Elastic Stack 8.0

Understanding The Exploding and Vanishing Gradients Problem

Practical Deep Learning for Coders 2022

The fastai book, published as Jupyter Notebooks

Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise, 5th Edition

Attention? Attention!

An Intuition for Attention

Pen and Paper Exercises in Machine Learning

Transformers From Scratch

Mathematics for Machine Learning

Linear Algebra Review and Reference

Probability and InformationTheory

Linear Algebra

Mathematics for Machine Learning

An overview of gradient descent optimization algorithms

An overview of gradient descent optimization algorithms∗

How GPT3 Works - Visualizations and Animations

GPT in 60 Lines of NumPy

Tensor2Tensor Intro

The Annotated Transformer

The Illustrated Transformer

Neural Machine Translation (seq2seq) Tutorial

What Are Word Embeddings for Text?

Deep Learning for Natural Language Processing

Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)

The Random Transformer


Stanford CS25 - Transformers United - YouTube

CS25: Transformers United V3

(A Brief Video Overview of) Neural Circuit Diagrams

Spaces using openai/whisper-large-v2 232

Text Summarization: How to Calculate BertScore

Some Core Principles of Large Language Model (LLM) Tuning

MotionGPT: Human Motion as a Foreign Language



An intuitive introduction to text embeddings

Watching Neural Networks Learn

Mathematics for Machine Learning

What is backpropagation really doing? | Chapter 3, Deep learning

Generative Agents: Interactive Simulacra of Human Behavior

VOYAGER: An Open-Ended Embodied Agent with Large Language Models

Can we align LLMs to honesty via instruction finetuning?

Getting Started with Reader