Join 📚 Kevin's Highlights
A batch of the best highlights from what Kevin's read, .
it’s important to realize that **ChatGPT and LaMDA aren’t trained to be correct**.
You can train models that are optimized to be correct—but
that’s a different kind of model.
Models like that are being built now;
they tend to be smaller and trained on specialized data sets
(O’Reilly Media has a search engine that has been trained on the 70,000+ items in our learning platform).
And you could integrate those models with GPT-style language models, so that
one group of models supplies the *facts* and
the other supplies the *language*.
Sydney and the Bard
Mike Loukides

Here’s what the entity resolution query looks like:

I’m joining the table with itself on state + zipcode
to reduce the search space and
using string similarity thresholds
for filtering potential duplicates.
In entity resolution methodology this is known as “blocking.”
Fundamental Data Engineering Concepts - Part 2
Ergest Xheblati
...catch up on these, and many more highlights