Join 📚 Kevin's Highlights

A batch of the best highlights from what Kevin's read, .

it’s important to realize that **ChatGPT and LaMDA aren’t trained to be correct**. You can train models that are optimized to be correct—but that’s a different kind of model. Models like that are being built now; they tend to be smaller and trained on specialized data sets (O’Reilly Media has a search engine that has been trained on the 70,000+ items in our learning platform). And you could integrate those models with GPT-style language models, so that one group of models supplies the *facts* and the other supplies the *language*.

Sydney and the Bard

Mike Loukides

![](https://pbs.twimg.com/media/FzA70wlXsAImjFq.jpg)

Autistic Burnout

Emily♡

Here’s what the entity resolution query looks like: ![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97ec307e-6b10-46fd-8b6a-773f9f163875_1212x728.png) I’m joining the table with itself on state + zipcode to reduce the search space and using string similarity thresholds for filtering potential duplicates. In entity resolution methodology this is known as “blocking.”

Fundamental Data Engineering Concepts - Part 2

Ergest Xheblati

...catch up on these, and many more highlights