Join 📚 Kevin's Highlights
A batch of the best highlights from what Kevin's read, .
Here’s what the entity resolution query looks like:

I’m joining the table with itself on state + zipcode
to reduce the search space and
using string similarity thresholds
for filtering potential duplicates.
In entity resolution methodology this is known as “blocking.”
Fundamental Data Engineering Concepts - Part 2
Ergest Xheblati
That’s a lot to take in, but this image can help you visualise it:

German Prepositions – The Ultimate Guide (with Charts)
George Julian
If you’re mathematically inclined,
then you could use the [pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle)
to describe hash collisions more formally:
> Given *m* items and *n* containers,
> if *m* > *n*,
> then there’s at least one container
> with more than one item.
In this context,
items are a potentially infinite number of values
that you feed into the hash function,
while containers are their hash values
assigned from a finite pool.
Build a Hash Table in Python With TDD
Bartosz Zaczyński
...catch up on these, and many more highlights