HM
Bundled by Hannes Moser

☕️ Daily morning read

What's included

[ On | No ] syntactic support for error handling

Quantum Computing Without The Linear Algebra

Machines of Loving Grace

Relevance Filtering for Embedding-based Retrieval

Kernel Trick

Search results diversification with Metarank

Revolutionize Text Deduplication in Large Language Models with Xorbits

nvmath-python

Excitement-driven development is the best?

Misconceptions In Finite-Trace and Infinite-Trace Linear Temporal Logic

Bitwise Binary Search: Elegant and Fast

How Elon Musk and SpaceX Plan to Colonize Mars - The New York Times

What is Bipartite Graph

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

How To Design Software Architecture For Startups

Illustrated Guide to LSTM’s and GRU’s: A step by step explanation

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Don’t Build AI Products The Way Everyone Else Is Doing It

Crunchy Bridge for Analytics: Your Data Lake in PostgreSQL

Learning Mental Maths with The Abacus Finger Theory

How to do distributed locking

Interactive Topic Modeling with BERTopic

Similarity Search with IVFPQ

SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval

We need to talk about interactors

Cosine Similarity for 1 Trillion Pairs of Vectors

The Okapi BM25 formula

Learn-to-Rank with OpenSearch and Metarank

From RankNet to LambdaRank to LambdaMART: An Overview

Zombie Startups

Names of large numbers - Wikipedia

An Image is Worth 32 Tokens for Reconstruction and Generation

Persistent Storage Options

Breaking My University's Machine Learning Competition

Getting started with embedding V8

Announcing Vespa Long-Context ColBERT

Get shit done by warping friction-space

Extracting Concepts from GPT-4

Fine-Tuning Sentence Transformers for Embedding Search

Binary Decision Trees

Product Quantization for Vector Similarity Search (+ Python)

SSTable Definition

How We Fused DuckDB into Postgres with Crunchy Bridge for Analytics

The Math Behind KAN — Kolmogorov-Arnold Networks

Bags of Documents and the Cluster Hypothesis

A Complete Guide to Causal Inference in Python

Elastic Search 8.14: Faster and more cost-effective vector search, improved relevance with retrievers and reranking, RAG and developer tooling

Scalar quantization 101

Partitioning an existing table

Mojo vs. Rust: is Mojo 🔥 faster than Rust 🦀 ?

Byte-Pair Encoding tokenization

From Modular's blog we learn that Mojo (the new programming...

LZ4 Compression Explained

Why, after 6 years, I’m over GraphQL

An Intuitive Guide to Maxwell’s Equations

React Project Architecture using Barrels

Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode

githublog/2024/5/29/fast-inverse-sqrt.md at main · francisrstokes/githublog · GitHub

Barrel files and why you should STOP using them now

I saw a bunch of "Ruby/Hotwire is slow" and "Who...

Introduction to Log structured merge (LSM) Tree

A Hybrid information retriever with DuckDB

Vespa and LLMs

Three Laws of Software Complexity (or: why software engineers are always grumpy)

Recommender system using Bayesian personalized ranking

An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them

I just hiked for 9 grueling days to Mt Everest

Training and Finetuning Embedding Models with Sentence Transformers v3

What's new in Kotlin 2.0.0

Consensus-Driven Development

Max Span Tree Overview

C language inventor spurns Google's language exam

In defense of linked lists

It’s always TCP_NODELAY. Every damn time.

KAN: Kolmogorov-Arnold Networks

Understanding Predictive Maintenance — Data Acquisition and Signal Denoising

Why is Kafka so fast? How does it work?