melee_warhead on benn.substack

65 Comments

Dec 4, 2023Edited

1) We're in the same space. Where I get concerned is that when I think about most of the "math on numbers" problems, a very large % of them are descriptive, and they work in obvious ways. So, add a high number, and the average changes.

The text-math examples are more complex modeling problems where the inflection points of change are less obvious. I'm in agreement that digestibility is likely possible. I disagree with the idea of turning it into a metric.

2) "If you gave 20 interviews to people and said, "tell me what is important," they'd probably say pretty much the same thing if you gave them that 20 plus 5 more." And there may be ways of doing this with types of training for these models, as in "retain X clusters but add Y new variables".

I just know that with people, they're doing a thoughtful trade-off evaluation on their clustering approach. Maybe a ChatGPT will just have the "good enough clustering", I don't know? From my understanding of Topic Modeling is that there are several different types of approaches, and that it's still domain-specific(as in, the type of solution will need to match the type of problem) with multiple potential approaches. If reality strictly works a certain way, the same topics will always show up. However, I think this is more model-like, and less standard descriptive analysis like at this point in time. As in, a bit more "fiddly" & "hand-wavy" than the comparison set of objective numerical metrics.

Maybe I'm off-base, or I'm being thrown off by cursory research into topic models a few months back. I can (in theory) see a company getting used to this approach, but it's not obvious.

Reply (1)

Benn Stancil

Dec 5, 2023

1) Sure, that's fair. It's definitely not precise. You couldn't do anything properly scientific in this way I don't think. It'd really have to be more like humanities research, where there's not only variability, but sometimes outright disagreement. (Though as I say that, I do wonder if there'd be some sort of rough "central limit theorem" with this, where if you have large enough samples, every model built in broadly similar ways would converge-ish. But who knows.)

2) I could see "different models do different things" also be related to the topic modeling stuff you're describing too. Even if LLMs (and AI generally, mostly) wasn't fundamentally probabilistic, you can always get different results by asking questions in slightly different ways, training the models differently, using slightly different models, and so on. So even if one company had a standard approach for how they do it, it's almost more cultural. The research analogy might still work there: Give 20 interviews to one research team; they'll give you X back. Give 20+5 to the same team, you'll probably get Xish. Give 20 to a different team, who knows? You could get something entirely different.