I am not sure if this is a good explanation, and I am not sure if anyone will bother to read this, but I don't think you can properly discuss "AI" without understanding at least the basics of machine learning. And I'm not talking sit down and get a degree, or learn programming, or anything like that, but at least on a conceptual level understand what it is trying to do.
All of this is built upon the foundations of data science and statistics. While now just anyone can interface with natural language processing (NLP) models (like the ones behind ChatGPT and Claude and whatever else you're using that generates text), and you feel like you're having a conversation, that's all the result of a
lot of data being fed into an algorithm that tries to "predict" what comes next.
Essentially, it asks itself: "Considering that these words / characters / sentences (it varies) are fed to me, what comes next?" And its reference for this is a human-made corpus of content, either manually curated or just feed in such large amounts that eventually it "gets it."
When you open Vahaduo and run a calculator, you get "fit" back— we all know how these things can "overfit," they try to mathematically maximize the fit. I'm not saying Vahaduo is some kind of machine learning/AI/whatever buzzword you want to use thing, but "fit" is in the end what these NLP are looking for. They all are fed a shitton of data, then they run on the probabilities calculated from that data.
I think visual examples help, but it's hard to show exactly what we're talking about in the field of language processing. Instead, here's a little gif of linear regression, which is pretty basic as a concept but can be extremely effective— gif might take a second to load:
I know visually this looks like "okay it's just moving a line a bit," but in reality it's creating a
function that gets the closest output to all the individual outputs (y-values) of points when given their inputs (x-values). Like when in math class you had f(x) = x^2; the
function is f(x), the input is (x), the output is (x^2).
Because the points on our example of linear regression aren't in a clearly defined straight line— I mean, they're all over the place— our ideal function is going to be the one that is the
most accurate to them, not the one that is completely accurate. (Not all regression is linear, especially not in machine learning... but that's another topic, and even then the points are random as hell). So, just like before, the goal is to
maximize the fit.
Almost all modern phones have a predictive text (or 'autofill') feature. That feature learns over time from inputs of
your typing style; while at first it may not guess very well, since the original model is based on the typing styles of
everyone, it eventually gets better and better at your style. Even then, it's not always going to get exactly what word you were going to say next, though whatever it provides will usually "make sense."
If you use some random uncommon word more often than most people— for me, it is "oscillation"— it will eventually "know" that you use that word more; there's a higher probability of that word coming next. It's all math, it always has been math.
For longer generations, it's a pretty similar concept— but it's not just considering the next word. The entire input is taken into account, and "coherence" becomes a thing. The program does not "know" what it has written, but let's say it's trained on a bajillion things and some of those things are conversations that go like this:
Adam: Hi, what are you going to bring up at the meeting tomorrow? I need some ideas on a topic. I have already started my presentation, here's what I have... {blablabla}
Alice: Hello, I'm bringing up this. I think you could {more blablabla}.
I never claimed to be a good writer, or a non-lazy one. But Alice is repeating some of the things Adam says and building on it. Eventually, it's going to figure out that when replying, it will take some ideas and build on them.
The thing I want to emphasize is that
natural language processing is not the only form of machine learning/AI. These concepts (and the deeper ones we use nowadays) have been around for quite some time, and you're not always going to be interfacing with some kind of NLP model, which are mostly unsupervised nowadays (so they're just given a giant amount of data and eventually learn how to do it "right"). It's probably even more exciting, to me at least, that we now have the computing power to process scientific information en masse and have a more powerful tool to find patterns.
There's a lot more than that, but yes, this was mostly just for myself to type something out since I have been itching to do it
Bookmarks