Join 📚 Kevin's Highlights

A batch of the best highlights from what Kevin's read, .

Here’s what the entity resolution query looks like: ![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ec307e-6b10-46fd-8b6a-773f9f163875_1212x728.png) I’m joining the table with itself on state + zipcode to reduce the search space and using string similarity thresholds for filtering potential duplicates. In entity resolution methodology this is known as “blocking.”

Fundamental Data Engineering Concepts - Part 2

Ergest Xheblati

That’s a lot to take in, but this image can help you visualise it: ![](https://www.fluentin3months.com/wp-content/uploads/2021/09/german-prepositions-info-683x1024.jpg)

German Prepositions – The Ultimate Guide (with Charts)

George Julian

If you’re mathematically inclined, then you could use the [pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle) to describe hash collisions more formally: > Given *m* items and *n* containers, > if *m* > *n*, > then there’s at least one container > with more than one item. In this context, items are a potentially infinite number of values that you feed into the hash function, while containers are their hash values assigned from a finite pool.

Build a Hash Table in Python With TDD

Bartosz Zaczyński

...catch up on these, and many more highlights