Join 📚 Kevin's Highlights

A batch of the best highlights from what Kevin's read, .

Here’s what the entity resolution query looks like: ![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ec307e-6b10-46fd-8b6a-773f9f163875_1212x728.png) I’m joining the table with itself on state + zipcode to reduce the search space and using string similarity thresholds for filtering potential duplicates. In entity resolution methodology this is known as “blocking.”

Fundamental Data Engineering Concepts - Part 2

Ergest Xheblati

You likely know what this means even if you don’t know it in those words. The hype cycle, as defined by Gartner, [which tracks it](https://www.gartner.com/en/chat/gartner-hype-cycle), is that series of cyclical events that happens around nearly all emerging technologies: the breakthrough, the “peak of inflated expectations,” the disillusionment, the period of actual serviceable uses of the tech, and the time when it’s adopted. That pinnacle is the groan time, the moment Justin Bieber drops more than $1 million on an NFT. The moment Facebook buys Oculus. The moment the bodega starts taking bitcoin and you know you’ll never be able to escape this thing, whatever it is.

This Is the Worst Part of the AI Hype Cycle

Angela Watercutter

it’s important to realize that **ChatGPT and LaMDA aren’t trained to be correct**. You can train models that are optimized to be correct—but that’s a different kind of model. Models like that are being built now; they tend to be smaller and trained on specialized data sets (O’Reilly Media has a search engine that has been trained on the 70,000+ items in our learning platform). And you could integrate those models with GPT-style language models, so that one group of models supplies the *facts* and the other supplies the *language*.

Sydney and the Bard

Mike Loukides

...catch up on these, and many more highlights