The Future is Fine Tuned (with Dev Rishi, Predibase)
Dev Rishi: Like, I'm sure in the
next three to six months, we're going
to see great open source variants
that do something very similar.
So open, you're thinking of like,
Daniel Reid Cahn: arms
race is good, basically.
Dev Rishi: I think arms race is like, net
good for everybody, like in, like, for all
Daniel Reid Cahn: consumers.
Oh, awesome.
So we can just jump in.
Well thanks so much for joining us
I'm here with Dev Rishi Dev did his
undergrad and master's at Harvard
in CS went on to work at Google,
where he worked on Firebase, Kaggle,
and then Google Cloud AI, which you
mentioned became Vertex AI just towards
the tail end of your time there.
Yeah.
And then you started Predabase, which
you founded to do a couple of things
you started with, if I understood,
the low code framework Ludwig.
Mm hmm.
Yeah.
And then you guys also produced
Lorax, and now you focus on Predabase.
fine tuning and hosting language models
that are task specifically trained.
Is that
Dev Rishi: right?
Yeah, that's pretty much perfect.
You know, when we started Predabase
in 2021, it was on top of Ludwig,
which is a low code framework for
building deep learning models.
And we were convincing everyone
why deep learning mattered.
And then in 2023, everyone cared about
deep learning, but just one variant
of them with large language models.
So we really, I think, pivoted the
company to be very focused on LLMs.
And our, you know, unique views is how
do we help you fine tune those models
and how do we help you deploy them?
Daniel Reid Cahn: So language,
Curious you weren't just focused on
language, became focused on language.
Do you think that language is
special, or do you think people
are focused on it today because
that's, like, what's possible?
Dev Rishi: I think two things.
One, I think language is special.
I also think we're focused on
it because we're very early.
Like you know, the industry is early and
as like startups were also quite early,
like if you think about a lot of the kind
of core models that have existed for a
while, the transformer architecture I
think has been applicable for language
and like computer vision in particular.
Now you've applied it to audio, you've
applied it to video and multimodal,
but like the types of models people
actually put into production pre OpenAI
were probably like BERT, LongFormer,
Decilbert, like these variants that
were pre trained deep learning models.
And then like, you know, the VIT
and Vision Transformers from Google.
Those are like the two areas.
And so I think language is special
because it's like, because it's possible
to get started very early on that.
And we had a lot of prior
art to be able to build with.
And a lot of use cases are
transcribed in language today.
You
Daniel Reid Cahn: say it's easy
to get started in language.
Like, why is language hard?
Dev Rishi: For two reasons.
One, I think there is that
history and background already.
So we like have a lot of tasks
that are already defined.
A lot of the types of tasks we see
people looking to apply LLMs for today.
Four years ago, companies were trying to
do a lot of those same types of tasks, but
using different model variants like BERT.
So The use case imagination,
the evaluation criteria, the
datasets themselves, I think are
already set up nicely in language.
There's more prior art there, I think
is really what I want to say, actually.
Daniel Reid Cahn: Because, I mean, I
tend to think the other, the other way
to look at it is just like information
theory, and like how condensed information
is, that like, if a picture's worth a
thousand words, like, that's awesome, but
it has like a ton of pixels, you know?
Yeah.
So, You know, you now have like
megabytes of data that really
represent only a thousand words.
A thousand words is pretty small.
I mean, like I, given that you guys
have focused on infrastructure and
tooling, like, do you think that there
is, you know you know, is it a good
place to be in the language space or
is it sort of like scary that we're all
handicapping ourselves by just focusing
on this relatively easier problem?
Dev Rishi: To be honest, it's both.
Like I feel like it is a good place
to be in the sense that the compute
infrastructure and layout, I think is
much more optimized for being able to do.
The types of production, lower latency,
and higher throughput applications
we'd want with language models.
You know, even like, again, four
years ago on like, V100s, T4s, you
could effectively serve some of these
types of like smaller language models.
Today, you can get the type
of throughput you'd want.
I kind of struggle to imagine if you're
a really fine tuning multimodal video
models, what kind of throughput you'd
actually wanna be able to do, right?
Yeah.
If you're a security company and you
have like security camera footage
from all of your different offices,
I dunno if there's realistically a
great way to be able to process all
of that video that comes in daily for
many different cameras very quickly.
However, if you're a bank or if you're,
you know, an and you have many emails
that go through, we know we can scale
up that infrastructure up and down.
So
Daniel Reid Cahn: just to
clarify, you're saying if I am.
A security company and
I want security cameras.
Yeah.
I can't use multimodal language
models because they're super heavy and
powerful and if my goal is just to see
that someone walked in the door like
that's crazy overkill, way too slow.
So.
I think it's harder
Dev Rishi: to use in production.
Like I think you can start to use
it but what you can't do is probably
throw all of your video feeds
directly out of it without it being
really either cost prohibitive.
Especially given kind of like
the crunch on what hardware
you need to be able to run on.
Whereas I think you can go into production
with something like email use cases,
where people have just as many emails
that get sent probably as like, you know,
security camera footage as an example.
Daniel Reid Cahn: I mean,
there also are general purpose,
like, image models, right?
Yes.
Like segment anything model, like SAM.
I don't know if that's outdated now, but.
Dev Rishi: Yeah, so I
guess, it makes sense.
And that's where actually, honestly,
that's where I see the progression though.
It's like language really kind of first.
CV and computer version and images next,
but then there's like even higher fidelity
of like images mixed with audio mixed
with video Yeah Where I think things
like more or less get heavier and heavier
over time and you asked is it like?
Convenient or is it scary and I said
both the reason I'd say it's scary is
because I think the space is moving
so quickly That like what you want to
make sure to do is like Offer benefits
that aren't gonna get commoditized
at the infrastructure layer, right?
like like Yeah, it's possible to do
language modeling that means a lot
of people are doing language modeling
means the infrastructure for it's
getting cheaper And cheaper and so
how do you think about your mark?
Daniel Reid Cahn: Yeah, whereas
like very few companies are doing
video because it's so freaking hard
Dev Rishi: Yes,
Daniel Reid Cahn: so language
you mentioned you want to
avoid being commoditized.
I guess
Just correct me if I'm wrong,
but like, it seems like model
hosting is getting commoditized.
Is it?
Yeah,
Dev Rishi: I think that there's I mean,
I think that there are some differences
that model hosting providers will
go ahead and try and usurp There's
two key ones that I think about.
The first is like performance
on throughput and latency.
And so some performance like Groq
I think is a great example of
one that has gone to the hardware
layer to be able to optimize that.
The second I would say
is around workflows.
For like model hosting, but usually
there's more advanced types of use cases.
For us, the workflow we're most
interested in is when people have
multiple fine tuned models, which is
where we developed Lorax, a framework
for hosting many fine tuned LLMs.
But we've seen other people invest
in hosting solutions that like, allow
you to do shadow deployments, A B
testing, you know, incremental rollouts,
blue green deployments, and others.
Just the part that I think is getting
commoditized is like if you're just doing
base model inference and that's it, it's
hard for me to understand why you pick one
provider versus another other than cost
Daniel Reid Cahn: and quality.
I mean, if you're, I, I, I'm totally with
you, by the way, I think I won't list all
of them, but there are other companies
today offering free APIs of base models.
That must be really hard to compete with.
Dev Rishi: Yes.
I think that in my view, it's not actually
something I want to compete with at all.
So when I think there are companies
and I actually think some of these
companies have made meaningful amounts
of revenue even but I don't know if
it's necessarily high margin revenue.
That's probably the thing
that needs to get litigated.
Right.
But essentially what you have is you
sign up for these massive pre commits
to GPU clusters, you optimize the
throughput of your models a lot, and
then you hope that you get really
good utilization across those, so
that you can more or less squeeze
the lowest price per million tokens.
Daniel Reid Cahn: Yeah, whereas
with, so Lorax, can you just
quickly explain what Lorax is?
Dev Rishi: Definitely.
So Lorax is essentially our
serving framework at Predabase
that we open sourced of last year.
It's a multi Lora serving serving
infrastructure, and what that really
means is, We observed that when most
people were fine tuning LLMs, they were
doing a parameter efficient or lower
fine tuning, where you only customize
a small subset of the weights of the
model, usually much less than 1%.
Daniel Reid Cahn: So the theory, just to
clarify, is like, you can fine tune an
entire model, but the vast majority of
the language model is just like reading
English, understanding basic concepts.
And so realistically, like, like a human,
you could have very specialized humans
who are really damn good at their job, but
99 percent of their brain is identical.
So we think like 99 percent of the model.
Can be standardized, 1%.
Dev Rishi: I think that's
a great analogy, yeah.
Actually, Databricks just put
out a paper today on LoRa fine
tuning versus full fine tuning.
Those are generally like the two
types of domains, and obviously
there are different variants.
There's LoRa, there's DoRa, there's
quantized LoRa, and a few others.
Daniel Reid Cahn: like when I
first saw the lower paper, it's
really hard to underestimate.
Like it was freaking surprising as hell.
Like, I do not think it
was like remotely obvious.
Like,
Dev Rishi: I don't think it's
like, I don't think it's intuitive.
And like the key intuition is used,
like, you know, essentially discover
a small transformation, a small subset
of weights that you can go ahead and
apply against like the overall model.
And then that is very customized towards
the tasks that you're looking to do.
But then I think like,
Daniel Reid Cahn: even in the paper, I'm
just saying like, they, they were, you
know, they acknowledge at some point,
they're like, we know Laura is not going
to work for everything, but we haven't
yet found a case where it doesn't work.
Dev Rishi: From our standpoint, I've
been really impressed with the broadness,
the broad applicability for it.
And I think actually the
intuition goes towards you.
I'll talk a little bit about why I
see that broad applicability in a
second, but the intuition, I think,
is actually very similar to what
you said, which is, let's imagine
you had, you know, two human brains.
99 percent of the overlap
is actually quite similar.
A lot of, like, you know,
both of us understand the
fundamentals of English language.
We can both see that these mics
are black and, you know, that we're
sitting on blue chairs, for example.
But there are things
that make us very unique.
It's a very small subset of,
like, the actual weights or
neurons, like, in our brains.
And what the paper from Databricks that
was released earlier today, I think
actually indicated was, Lora fine tuning
is not necessarily as good at full fine
tuning when you have lots and lots of
data, and you want to relearn something
extremely specialized, but it also is
not as prone to catastrophic forgetting.
And what I think that really is, what I
think is important there is, The kind of
base model knowledge is very important.
That foundational understanding of
English language, the understanding
of like these, you don't want
to overwrite a lot of those.
And that's why when we actually talked
to customers a year ago, I talked to
some customers that were like, I don't
believe in fine tuning, because I
fine tuned and my model got way worse.
Like I thought, you know, worse
than base model performance.
Like, and this was a common thing.
I think it was because the same things
we saw with BERT models in 2017 with
catastrophic forgetting and others.
What we did in Lor Land was you know Lor
Land was a project where we essentially
picked out 27 different datasets.
They represented tasks from legal clause
classification to question answering
to comment class to content moderation.
And we fine, we LoRa fine tuned, in
fact, we queue LoRa fine tuned just for
efficiency quantized LoRa, for across 27
different tasks like Mistral 7 billion.
And initially our goal was just
to show, hey, Actually, you
know, your LoRa fine tuned model
is better than your base model.
But what we actually saw that
was for 25 out of those 27 tasks,
it was significantly better than
the base model, up to the point
where it was kind of matching or
exceeding GPT 4 level performance.
And so I agree with the authors of
the QLoRa paper where they said you
know, I don't know, I'm sure it's not
applicable for everything, but we're not
sure what it's not applicable for yet.
We found LoRa to be extremely applicable
across, you know, many, many different
types of tasks that we've benchmarked.
Daniel Reid Cahn: Yeah, that makes sense.
And then I guess the other big benefit
is you mentioned, like, the efficiency
thing, which is like, if you're doing,
if you're offering a base model, You
can scale to a lot of GPUs, which
means you have like some threshold
of the number of requests per minute
you can handle across the platform.
So let's say a company is like, you know,
we're going to offer, you know, inference
to a thousand companies, each get a
thousand requests per second or whatever
it is, thousand requests per minute.
You know, and therefore we need a
million requests per minute to handle.
We're going to get those GPUs on
all the time, and as long as we get
utilization high, we can make money.
The problem is, If you're in fine tuning
land, you can't do that generally, right?
Because you would need to have, you
know, if you, you need each company
to have, let's say, their own fixed
thousand requests per second, I guess,
in your, in LoRa, what's really cool
here is that you can host many different
LoRa's on a single machine, right?
Dev Rishi: That's exactly right, yeah.
So I think the origins of Lorax really
came in because we wanted to have a
free trial version of our product.
We wanted anyone to be able
to try out fine tuning.
And it turned out that
wasn't that big of a deal.
You know, I could allow, we could
even give away GPU access for
fine tuning because the models
are pretty inexpensive to train.
The 27 ones that we benchmarked
in Loraland were, on average, 8 or
less to fine tune for a given run.
So they weren't that expensive to be able
to give away for free trial access, but
what was gonna get a lot more painful
was once you fine tune that model,
it's I was a lot more concerned about
people deploying them to test them.
Because if every single person who is
fine tuning a model had to spin up a
new GPU, that was gonna get very kind
of painful and restrictive for us,
because there's only a limited set of,
you know, A100s that I would say the
world has, and also, you know, a much
smaller subset of that that we have.
Daniel Reid Cahn: Yeah, yeah.
Dev Rishi: And so we needed this
technology to allow free trial users.
This is the origin story for Lorax.
We needed a way to be able to
allow free trial users to be
able to test out their models.
Because it probably wouldn't be
good enough in a trial to say, Hey,
congrats, you fine tuned a model,
now pay us in order to be able to go
ahead and try out a single inference.
So we developed this way where
we could host a single base
model, call it Mistral 7 billion.
And then, you know, Daniel could come
in with his fine tuned Lora, and I
could come in with my fine tuned Lora,
and we'd just exchange those Loras
to be on top of the same base model.
And that was, you know, where Loras
came out and a lot of our kind of,
I'd say, innovations in the open
source have really been how do we make
that exchange extremely efficient?
Which is
Daniel Reid Cahn: crazy cool, by the way.
I mean, like, it's, I think it's hard
to overestimate like the impact of
like, typically, I mean, the big thing
is about cold start time to me, which
is like, if I don't have dedicated
compute, and I am a free trial user,
or like, At Slingshot, we play with
a lot of new techniques constantly.
We might train 20 models just with minor
differences on some experiment, and we're
like, let's run inference in all of them.
Either we have to, you know, if you're
just doing arbitrary compute, the
cold start time to load 70 billion
parameters into memory can be huge.
You know?
Exactly.
First, allocate the machine
and if I wanna make.
Ten requests to ten models, then
that means that like either I need
to get ten machines in parallel, each
one taking 15 minutes to spin up.
No, no one has that kind of compute,
you know, or you go to Predabase
and then you're like, Hey, can
you load all 10 onto one machine?
And it's like, sure, I'll do each one.
It'll take me half a second.
Dev Rishi: Yeah, exactly.
You know, when we first launched Lorax,
we thought one of the use cases would
be People who had many, many different
fine tunes for many different use cases.
So an example, you wanted to fine
tune for many different customers.
Turns out the earliest use case is just AB
testing different versions of the models.
And that's obvious to
say like in retrospect.
But usually when you have fine tuned,
you didn't just fine tune one model,
you've kind of created a workflow, you've
tweaked a few different parameters.
Or at least
Daniel Reid Cahn: like for us.
multiple checkpoints if we're like,
Hey, did more training actually help?
All right.
I'm being really nice.
So let me ask on the harder
side, I guess a little bit.
There's one other company
doing this OpenAI.
Are they a competitor?
Dev Rishi: OpenAI view in two ways.
I actually love when people use OpenAI.
I think they are like, look,
I think they are a competitor.
I think that The way that I see
it is today the vast majority of
workloads that exist on GenAI are
on a proprietary model like OpenAI.
And I think, you know, open
source is a very small percentage.
I think that there is absolutely
roles for both of these to exist.
So I don't think it's like an
either or set, solid market.
I don't think everything is open source.
I don't think everything is open AI.
I just expect the market,
like the overall pie, to grow.
And I expect the overall market
share of open source models to grow.
The reason I expect that market share
of open source models to grow, and at
Predabase we do all open source LLMs
and fine tuning of open source LLMs.
The reason I expect that is, I think
it's a very natural progression.
OpenAI is extremely easy to use.
We have to give them a lot
of credit for that, right?
Almost all of our customers
started on OpenAI.
And they decided to go ahead
and fine tune a smaller task
specific model for three reasons.
Cost, latency, or control.
And so when somebody isn't an
active user of OpenAI, I almost
get a little bit nervous.
Because I'm like, well, do you know what
your GenAI use cases are going to be?
Is there any reason you haven't
already experimented at that level?
Whereas some of our most qualified users
are saying things like, I have lots of
GenAI I have lots of OpenAI GPT Four load.
It's just getting very expensive
or very slow at scale, I'm getting
throttled on late limits, or I want
to go ahead and, or, or, you know, I
built my POC in prototype there, but
to actually go into production, my
company isn't going to allow me to use
that external service, and I want to
find a unit smaller open source model.
Daniel Reid Cahn: That actually
makes a ton of sense, yeah.
I like the framing of like, I want
to know that you've used OpenAI.
Dev Rishi: Yeah, they're a
competitor and they're also, I'd
say, a gateway drug to the source.
I think that's
Daniel Reid Cahn: interesting.
So last week it hasn't been published
yet, but my last podcast was with this guy
Talfen Evans, a friend of mine at DeepMind
who works on pre training of Gemini.
And I asked him similarly, like, is there
going to be one model to rule them all?
And he surprised me by saying no.
And he was like, no, I think that
companies like OpenAI care only
about the very biggest models.
I think there will be a
space for smaller models.
But just to ask you, I mean, Do you think
there will be one model to rule them all?
Dev Rishi: No, I don't think
the history of machine learning
has ever been like that.
And I don't see that like
really going forward.
The main reason I think
that there won't be.
One model to rule them all is, I
think that there will necessarily
be a price performance versus
quality trade off, like over time.
Like, I think you will necessarily
see larger models are a little bit
better, from like a, you know, quality
standpoint in a general setting, and
smaller models are cheaper to serve.
And I think that we see this same
experience today, people distill
these larger models into smaller
tasks like models once they have
very specific data sets as well.
Just to be
Daniel Reid Cahn: clear, I think there
are like two separate things here.
Yeah.
The first being like, will there
be one general purpose model
that's so powerful that we could
just prompt it to do everything?
Hmm.
Which I think is the, is that,
that's definitely You know,
up for a lot of contention.
I think there's also the element of just
like performance, like in terms of like,
it might be that that happens eventually,
but it just is a long time in the future.
The other question then is just like, what
if OpenAI's models are so good that fine
tuning them is just so freaking good, even
if you took their smallest possible model?
What do you think?
I mean, they don't have
a small model right now.
GPT 3.
5 is the smallest.
Dev Rishi: Yeah.
Daniel Reid Cahn: Do you think there's
a reason why they're not going smaller?
Dev Rishi: I think, I mean, so
I'm just going to speculate it to
what OpenAI would say, but I think
that's actually pretty aligned.
Like, I think their core
mission is around, you know
essentially building towards AGI.
And I think like if that is your
core mission, one thing I've learned
as a startup is you have to be
extremely focused on what you do.
And I'm not convinced what OpenAI wants
to be able to solve, necessarily, is
like, how to be able to deploy the
most optimized, small, task specific,
fine tuned models that can be deployed,
you know, as individual instances that
people have high degrees of control
with inside their VPC, versus, how
do we go ahead and solve alignment
issues for my single, large, big model?
So that's why I think OpenAI probably is,
like, I think that it's, Not antithetical,
but it's a, it's a li it's a detraction
from focus on their core mission.
I
Daniel Reid Cahn: think it's interesting
'cause Han said the same thing.
Yeah.
And he's coming from a, the opposite.
He's coming from a big company.
Right.
You know, literally Gemini.
But he had a similar conclusion.
I think you're both wrong.
I'm like so confused by this.
Mm.
Like OpenAI had smaller
models for a long time.
They don't right now.
Yeah, but they did.
That's one thing.
And secondly, yes, their focus is
on AGI long run, but they've also
been very consistent about this
foundation model mentality where
they want people fine tuning, right?
So what I wonder is like if they believe
themselves to be a foundation model
where people are going to train on top,
doesn't that necessarily mean that they,
you know, want lots of fine tunes and
you know, even if that's I don't know.
Dev Rishi: I think the piece
that I feel like is most critical
for this is like the G in AGI.
Like my view is OpenAI is going
to solve things like in as
generalized of a setting as possible.
But there are like very specific,
like domain specific models that
are foundation models as well that
people go ahead and like train.
Code Llama being like one of the
most like popular subsets, right?
Yeah,
Daniel Reid Cahn: but I mean the
most popular code model is still AGI.
OpenAIs.
It's still OpenAI.
Yeah.
Through GitHub Copilot.
Dev Rishi: But I think, like, what
I, But I think if you had, like,
a low latency code classification
use case, as an example.
I'm just mentioning,
Daniel Reid Cahn: like, a good example
here is, like, CodeX is super low latency.
It's their low latency
distillation of GPT, basically.
Yeah.
Right.
Dev Rishi: Yeah.
But I think that like, will OpenAI
want to offer smaller distillation
versions of their own models?
Daniel Reid Cahn: They are
CodeX, I'm just saying.
Like, they're, let's literally CodeX.
Dev Rishi: Yeah, but I haven't seen
this like across like I haven't seen
this be a key focus for them, right?
Like I like what if I think about
like what OpenAI's key business is,
Codex exists I think that there are
so many other like opens like like
for example, I think ServiceNow.
starcoder You know, I think there's so
many of like the code foundation model
space is itself so quite like competitive,
and I think you're going to start to
see this across like every single subset
of domain, like medical, legal, you
know low latency hardware on device.
Maybe OpenAI will go ahead and
start to chew in some of this.
I think there's always a risk.
Anytime you have like a hyperscaler,
there's always a question of
like, well, why can't they
do X, Y, Z different things?
But I feel like what they're going
to be, like, if you OpenAI focused
on today, I feel like they're
focused on, you know, How do we get
to the next level of performance
from GPT 4 or GPT 5, for example?
And like, how do we
Daniel Reid Cahn: But then GPT 4.
0, I mean And
Dev Rishi: how do we get, and how do
we get through some of these developer
Like, I think they're, what they're
most interested in is From a research
standpoint, building towards AGI, which
is where I see that performance side.
And from a business standpoint,
building the developer ecosystem.
So I think one of the
big things about GPT 4.
0 was, how do we stop, like, how do
we make it, how do we eliminate some
of the frictions we consistently hear
from developers about using OpenAI?
That's
Daniel Reid Cahn: fair.
And I think that's what they care about.
You know, they can do that without
it being their central focus, because
it's not that far off, but I guess
the point here being, if they really
wanted to have the best ecosystem to
host their product, Lorax is basically,
and lots of loras of small models.
That could be a very different
business model that they might
or might not wanna touch.
Dev Rishi: I think that's true, but I
think in that world they'd have to compete
with decentralized model essentially.
Where today, you know, you have
many different institutions that are
putting out some of these very domain
specific like foundation models.
Like Yeah, but I, for code and
like, just to mention, so open.
Daniel Reid Cahn: We have
a partnership with OpenAI.
We also train our own models.
But from what we've heard I
believe it was not confidential.
They mentioned that they had 150
partners at the time that were
partnering to fine tune GPT Four.
I think.
I could be getting that wrong.
I could be.
Anyway, but their philosophy
when we had spoken to them was
very much like we want AGI.
We want to cover every use case.
We could do that through one model to
rule them all, but our actual main plan is
probably support these foundation models,
like have 150 perhaps or whatever, a
thousand companies where we have like, you
know, maybe specific AI for immunology.
startup that builds their
immunology model on top of GPT Four.
Personally, I think about this so
much because it affects my space,
but like, I'm so conflicted.
Like on one hand, are we stuck relying
on OpenAI because their models are
going to be so freaking good that
the only choice is to fine tune them?
Or, you know, is OpenAI focused so
much on generality, the biggest fish,
the general purpose assistance that
we should, you know, Not rely on them
Dev Rishi: empirically.
Hasn't it been opening, has been
focused on generality though?
Like if you think about their subset
of usage today, I would like Hazard a
guess that 98% of usage, if not more,
is probably like G PT 3.5, turbo G, PT
four and like the, you know, G PT four.
Oh.
Maybe like it's very much these
general purpose models rather
than small task specific like
I'm not confident about that.
Daniel Reid Cahn: I would
have to look into it.
Yeah.
Obviously I don't know,
but I think of demos, yes.
I mean, you're probably right, 98
percent of usage might even just be
demos, but I do think among companies
I tend to talk to, I'm not sure.
You might be right, but I wouldn't
be nearly as confident about the
Dev Rishi: 98%.
I would say of OpenAI
usage, I think that's true.
What I think people do is then build a
lot of workflows so that they can then
customize those models, or they, you
know, graduate to the point where they're
using a different open source model.
Daniel Reid Cahn: Could be I
just and then the other thing is
we were talking about pricing.
Yeah.
I When we were talking about GPT 4.
Oh, you were not surprised
by the huge price decrease
Dev Rishi: Yeah, Look, I think if you
look at the history of like OpenAI's
price cuts over the last 18 months,
you have seen pretty substantive
cuts in price and pretty substantive
like improvements in latency.
And I think that it's actually
very good for the ecosystem that
happens, because every time that
there is A massive cut in price.
You see the next release of like
six to twelve open source models
over the next six months that
are catching up at quality, but
a smaller footprint than others.
And so I, look, I expect GPT
Four will get cheaper too,
like over the next six months.
Daniel Reid Cahn: I do think the
coolest thing, like the kind of craziest
thing here to notice with OpenAI has
been, they have been at the forefront.
Yeah.
But there's also some phenomenon,
there's like the classic you know, four
minute mile story of like the first
time someone runs the first minute
mile, it's considered impossible.
And then as soon as they do,
other people know it's possible.
And I feel like GPT 4.
0 was like a huge shock to the ecosystem.
The minute it comes out,
it becomes possible.
Similarly with prices.
Were you shocked by GP by
the audio model, by the way?
Dev Rishi: I would think I was most
surprised that was the low latency.
You know, that was the part that was,
yeah, I think the idea that they were
working on like heavily just like
following what they had been doing you
know, over the last six months with
multimodal video adding in kind of audio
towards that, like that part wasn't.
so surprising.
The piece that I think was actually
the most interesting was where
they were able to get to like kind
of near real time response rates.
I actually so
Daniel Reid Cahn: disagree.
I personally, I don't know.
I had a, so a couple of weeks back I had
Chris Gagne from Hume on the podcast.
And I asked him, actually, I don't think
it was on the podcast, but I did ask
him about when do you think we'll have
end to end language models for audio?
And I don't remember his exact
answer, but he was like, He and
I actually fully agreed on this.
I really thought we were like far away.
I thought two to three years.
And why is that?
Do you
Dev Rishi: think it's like a
function of the training data?
Like model architectures don't
well support it or something
Daniel Reid Cahn: else?
The like, I still have no idea how GPT 4.
0 did it, but biggest challenges
I'd imagine are training data.
There's way more text
data out there than audio.
Yeah.
And then text data is so much cleaner.
Like we were talking about
with images, like text.
has exactly the right signal.
Most of the time, text
is grammatically correct.
It has sentence
structures, it has meaning.
Think about, like, essays.
People thought about it.
They, like, wrote the
thing they wanted to say.
It's not just, like, a speech.
Audio tends to be more
this, like, speech format.
It tends to have a lot of artifacts.
And then fundamentally, like,
architecture wise, you need some way
to tokenize the speech and then return
back from speech space back, sorry,
from token space back to speech space.
No one's really built
great ways to do that.
I think, like, OpenAI, OpenAI.
was the closest, right?
They built Whisper, which
at least was speech to text.
And they were, as soon as they built
it, it was like state of the art,
super impressive until other people
started building similar things.
Do you think
Dev Rishi: Whisper was, for example,
a way to be able to collect data?
To be able to start to train kind
of an end to end audio model?
Daniel Reid Cahn: Totally could be.
Dev Rishi: Like, I actually think
that if you consider some of the
work that they had done in video
generation, you think about Whisper,
like, the idea that they got the end
to end audio model was actually, for
me, a little bit less surprising.
I think.
The idea that the end to end audio
model was able to do, like, generation
in such a quick time, like, sub 200
milliseconds, that, I think, was actually
really, really, that's the part that
I would say, like, I didn't expect.
If I had to expect what I would see
from an audio model, I think I would
have expected to see something clunky.
Daniel Reid Cahn: But I,
Dev Rishi: I've never
Daniel Reid Cahn: even seen
a high latency audio model.
That's why I'm like, even if,
personally, if they had released
a crazy high latency audio model,
I would have had my mind blown.
Dev Rishi: You've seen some of
these startup demos, though, right?
Where somebody is, like, chatting with
somebody on a phone, like, on calls.
Text.
Daniel Reid Cahn: to speech.
The language modeling, language
modeling right underneath it is not.
Dev Rishi: Yeah.
Daniel Reid Cahn: But I think like
the idea is like language model
generally is always pre trained
on like prediction of next tokens.
So conceptually that makes sense because
you can have a lot of examples of
like question answer message response.
Yeah.
Whereas with audio, you know,
usually most audio, if you take
a second of audio in the next
second, it'll be the same speaker.
If you were trying to learn how to have
a conversation from this podcast right
now, you know, me and you talking, That
would be so much harder than taking a
transcript of the same thing, you know?
Dev Rishi: You know, it's interesting,
again, I can't speculate on how they
made the training data, but what
you just said is like traditional
modeling is speech to text to speech.
Yeah.
And it does feel like that actually
helps you create some synthetic data
sets for doing intent audio modeling.
So I'm betting they
Daniel Reid Cahn: either use some
synthetic data or the other possibility,
which I think is phenomenal if they were
able to pull it off, is learn almost
entirely from text and then like 0.
1 percent of training being speech
and then somehow get the whole
thing to work is I think when you
watch the demo It's clearly not
perfect like speech generation.
It's clearly it's something it's clearly
like way better than anything We've seen
before Yeah but I also by the way give a
huge amount of credit that I think they
didn't try to like cherry pick the demos
and try To show the best possible cases
like the demos show things go wrong.
They show it via v1.
They show all the problems
Dev Rishi: Did the latency
surprise you like it was crazy
surprising to very surprising But I
Daniel Reid Cahn: mean, I'm also surprised
that they were able to I mean For a single
announcement to be you know, we made
our model faster, better, and cheaper,
all at once, like, usually you get
one, maybe two, but faster, better, and
cheaper, I don't want to like, you know,
blow too much steam up their ass, but,
Dev Rishi: I don't know, I was shocked.
I do think the better is parts that are
getting litigated right now, though,
in some of the Better in the sense of
Daniel Reid Cahn: audio
is really what I meant.
Okay, in audio that makes sense.
Increased capabilities.
Yeah,
Dev Rishi: yeah.
One of the things we've been doing
is 0 like in the same kind of fine
tuning leaderboard that we've had.
And so we're going to be releasing
those results in a few days, but I
think it's like, I think it remains
to be seen how much better, for sure.
Daniel Reid Cahn: Yeah, in terms
of like accuracy, obviously okay,
I want to ask about fine tuning.
Do you think fine tuning is hard?
Dev Rishi: I think that there's,
I don't think it has to be hard,
but I think there's practically two
things that are hard about it today.
Okay.
The first is the data.
Like, I think that's always been
one of the struggles for it.
Which is like making sure that
you have either a good completions
dataset, maybe that's a little bit
less challenging, or a good instruction
fine tuning supervised dataset.
This was like the problem
for machine learning in 1990.
It's like still the problem with fine
tuning and machine learning in 2024.
Then this, the thing, the part that
I think has gotten a lot easier
is the tweaking of the algorithms.
Used to be the case that like, if you were
building like a model, it was like this,
Weird arts slash highly experimental setup
where you're tweaking every parameter
from like learning rate and batch
size towards like your regularization
lambda to like anything, right?
You like everything was fair game and
you're going ahead and throwing around
you're throwing at it with fine tuning
the like scope of the number of parameters
that you actually need to adjust.
I think has gotten to be a lot
smaller in order to be able to see
kind of meaningfully good results.
Daniel Reid Cahn: Like what are they?
Dev Rishi: I think like target
modules tends to matter.
The lower rank I think also matters,
which is like the lower rank will
go ahead and correlate towards the
size of the adapter and capacity.
How many?
Yeah, exactly.
And then, you know, I think like classic
things like the number of app boxes
as well as your learning rate that
you might wanna work for, which is
like not necessarily a hyper parameter
as much as it is maybe a business
Daniel Reid Cahn: What about
app size and learning rate?
Dev Rishi: Oftentimes, I think
now you use like automatic batch
sizes and automatic learning rate
schedulers with fine tuning jobs.
At least that's what like we typically
do, like in Predabase as a default.
We do like out of batch sizing
because it's really a function
of the hardware that you're on.
Size of the model and size of
the data sets, input sequences.
And so rather than having to do a
lot of experimentation, you can do
some of that in a warm up phase.
Yeah.
So, I think like, like, I think
the tricky part of fine tuning
there has gotten less difficult.
I think the infrastructure still tends
to be a bit of a pain for people.
For sure.
Like, I've But, I don't know, I, I
Daniel Reid Cahn: do have to say,
like, depending on the model,
like, the first time our team
fine tuned GPT Four, we diverged.
Like, our loss just
went up instead of down.
Yeah, yeah.
And we were looking at that, we were
looking at the curve, and we were like,
Wait a minute, isn't this curve supposed
to go down, like, decrease loss, not up?
Yeah.
It could, I mean, GPT Four is fine tuned.
massive so it's particularly tricky, but I
mean we do think about person I mean not,
not a lot, I think we think about rank, we
think about learning rate, we think about
batch size, and we think about epochs.
Yeah.
Epochs you can compare with checkpoints
because you just train multiple Right.
Exactly.
We do find learning rate
can make a huge difference.
Yes.
And a way that you know, a difference
that's impossible to really detect,
like, it's really hard to know.
You're like, hmm, this one seems
smarter, this one seems nicer, and you're
Dev Rishi: like,
Daniel Reid Cahn: that makes
no sense because all I did
was change the learning rate.
Dev Rishi: And I should be fair, like,
I think learning rate schedules aren't
perfect, so I actually do think,
like, playing with learning rate is a
fundamental factor within, But like,
I mean, I remember, like, the biggest
difference I think about is like, I
remember in early versions of Predabase
where you could build these deep, like,
where you could build any, like, model.
And then you have to think about,
like, the whole architecture.
You have to think about the whole
architecture, you have to think about,
like, learning rate of subparameters,
you have to think about dropouts and,
like, regularization rates, you have to
think about so many different things.
Choosing your activation function
and your And that was before
you got to a good enough level.
Like that was before you even started
to see something like, you know, it
wasn't like the optimization step.
It was like the, I want to
see any value step at all.
Yep.
Today, I just chatted with somebody
today who like, you know, is, is former
colleague of mine can be critical.
Like he, he's like honest, I would say
about products and it was like, Hey,
what was your experience on Predabase?
And, you know, he told me a couple
of things they thought could be
improved, but his main point was.
I didn't expect it to be that this easy.
I didn't expect it to be yeah.
And he said, I didn't expect it to
like it this much either in terms
of like just the fine tuning piece,
because now you actually can fine tune.
And usually there's a lot of these
parameters you want to tweak, but that
first bit that you fine tuned in a
platform like ours, or if you fine tune
kind of, you know, externally that has
the right kind of default settings,
you're probably going to actually
immediately get a lift that Laurel
and it was all our basic defaults.
We didn't optimize the hyper parameters
at all, which is kind of crazy.
And I think that like, like lack of
hyper parameter optimization need.
is probably the part of fine tuning
that has gotten easy and why fine
tuning is no longer very difficult.
The parts that are hard are
the data and the underlying
infrastructure, not the algorithms.
Daniel Reid Cahn: I mean, I think the
data is really freaking hard, though.
I saw you guys did some
partnership with Gretel.
Dev Rishi: Yes, yeah, we just did
a webinar, I think, a couple days
ago in a partnership with Gretel.
And I think one of the main motivations
there is I think synthetic data is
getting better and better kind of around
this training workflow and we see a
lot of people use it in different ways.
So that helps anything that helps
the data set creation side of it.
I want to be able to be very front and
center of because that means more people
can start to use us for fine tuning.
Daniel Reid Cahn: Totally agree.
You know, on the data side, I have
to say so the challenge for us on the
data side, I definitely think like
when we, like at Slingshot Our, our
biggest challenges on machine learning
are definitely data related, but the
way that we frame it is basically like.
We are trying to achieve, and I think
this is true of a lot of AI companies,
a task for which no data exists.
So if you're training a general purpose
chat assistant like ChatGPT, or if
you're trying to train Hume or whatever
kinds of specialized, you know, a
legal assistant, those don't exist.
Right?
There are no legal AIs, right?
And so if you're trying to show, like,
what would a great answer from, you
know, an AI doctor sound like, well,
you can look at what doctors say.
But doctors say and ha.
Doctors make mistakes all the time.
Doctors forget to ask the right question.
They're constrained by the
amount of time they can spend.
They can't write out long answers
because of those constraints.
They, you know, et cetera, et cetera.
Yeah.
And so you're basically like, if I could
just find a data set of a billion examples
of people going to the doctor, asking a
question, getting a perfect answer, right?
Boom.
All I need to do is walk over to
Predabase and train with that data.
That would be phenomenal, right?
And I'm sure you guys can handle that.
But where the hell do I
get a billion examples?
Like, even if I got a billion examples
of doctors talking to patients, I still
wouldn't have, you know, an AI doctor.
Dev Rishi: Yeah, I don't have
an easy answer here, honestly.
And like, this is, again, where I feel
like I'm most interested in, like,
tools for the data infrastructure
side of things to be able to advance.
What I will say is we see people maybe
not as complicated as like there's no such
thing as an AI doctor, but we see people
that are like, I don't have label data.
And the tricky thing is one way or
another you have to bootstrap it.
One way that we've seen people bootstrap
it is okay, the risky way you can
bootstrap it is you go to a subset of your
traffic and just launch with something
subpar and you collect like, you know,
like some user signal on kind of what
you want to be able to get feedback from.
That could be one approach, depending
on which industry you're in.
One thing we've seen is people
actually find GPT 4 quality with some
edits in post processing is actually
roughly where they might want.
So maybe the case is like you can get,
and this is not like, it's something
I've seen like as a repeating pattern.
Maybe the case is like GPT 4
can sound close enough to an AI
doctor for what you might want.
Maybe that's not the case, but
like in some cases GPT 4 is at
the quality where you'd want.
And the real concern is like, I just
can't use GPT 4 live in production
because like, you know, it's the
most, like one of the most expensive
models that exist out there.
Like it's.
It's really slow and rate limited
and can be like maybe outside
the organizational policy.
So we've seen people bootstrap with GPT
four for data collection, distill that
model down into smaller open source model.
Daniel Reid Cahn: Yeah.
I was going to ask, distillation.
I mean, I think one, one thing I also
want to ask about, so I tend to be
very optimistic about AI and also
pretty pessimistic just because like
AI workloads out there in the world.
Like AI you know, still pretty much hype.
Like, I'm very excited for AGI.
I'm very excited for where we're going.
I think technology is freaking phenomenal.
I think it's exciting.
You could really see progress, but most
people I don't think should be using
AI at work all that much personally.
Like, I don't, I don't know about like
AI writing, all that kind of stuff.
And one thing that gives me a
little pause here is that it does
seem like a lot of the time, the
use cases we're talking about with
AI are still kind of the old ones.
I know you guys pivoted.
So Pre pivot, you were focused
on those churn prediction,
revenue projections type stuff.
Dev Rishi: Yeah, I mean pre pivot we
were focused on, we're an ML platform,
we can help you do any type of model.
Which, by the way, is a very
broad like value proposition.
The types of things people would come with
was, yeah, like churn prediction, lifetime
value, like these are the types of things
people knew to use machine learning on.
Daniel Reid Cahn: Yeah.
Dev Rishi: And so we saw a lot of that.
So
Daniel Reid Cahn: I wonder, similarly,
like, you know, you were excited about
deep learning because you've got it.
You're like, Oh my God, this
thing is like intelligent.
It could do anything.
Why would you want to calculate
customer lifetime value when you
could literally like understand
everything about your customer?
Isn't that way more interesting?
So similarly, I mean, do you think some of
the use cases that we're imagining for AI
are just limited from this deep learning?
Same like past point of view of
like, you know, before we could
do lifetime value prediction.
Now we can still do it.
But with deep learning, you know,
and the analogy being like, I can
go through a blog post and find,
you know, a list of tags for SEO.
And it's like, yeah, yeah.
But is that really why
we want to build AI?
Dev Rishi: Yeah, but I'm actually
not convinced that's a bad thing.
Which is to say like, I think that over
a short period of time, like we're so
early within the phases, we have to
remember that most people didn't really
have a good lifetime value return
prediction like machine learning model.
That's why they were looking for these
innovative solutions to be able to do it.
I talk to customers still like, you
know, that have to do classification
tasks or extraction tasks over email.
The way that this happens today
is there is a back office function
somewhere that's like, you know, maybe
offshore going through a lot of these.
So is it?
As flashy to say like, you know,
what we're going through is blog
posts and extracting tasks for SEO.
We're going through emails for compliance.
We're going through transcripts
to be able to do that.
No, but we have to recognize that
the industry is like, we're, we're
like, we're, the industry is in early
stages of AI adoption full, full stop,
whether that's like deep learning, AG
or anything else along those lines.
And I actually think the biggest
thing we can do is start to solve
some of these narrow, boring tasks
so we can get to the cooler stuff.
What I actually don't love is when
you see like this really interesting
flashy demo from a company that's
like, here's our bot that's going to
teach us how to make more revenue.
But it's like, all right, have you
figured out how to like even just
solve customer support or something
else along those lines, you know,
these classic use cases within it.
So to me, like we were trying to
say like, this is going to be a
thing in 2021 using an old class
of maybe like technologies now.
Not all of them are bad, like long
former, but still actually quite effective
models, but like using an older class
and trying to show this is the value.
Daniel Reid Cahn: Yeah.
Dev Rishi: And now if we can get, you
know, and I would say like, maybe 1
percent of organizations got it, right?
Like, what were the, what were
the number of organizations
using deep learning pre GPT Four?
Not many.
And so what we can do is increase
that 1 percent to like 20%, 30%, or
50 percent even doing those tasks.
I think that's a massive win
over, over the past year.
So there's not like the
Daniel Reid Cahn: boring work
that still needs to get automated.
Yeah, let's do the boring work.
Let's just get it over with.
Dev Rishi: Yeah, let's do the
boring work because that boring
work is taking up a lot of time.
And like, like I actually, I almost
think like it's not necessary, like, do
we want to go ahead and skip five steps
forward or do we want to go ahead and
like build and solve the problems that are
sort of, you know, in front of us today?
I am actually personally
just most excited about.
For 20 years, people have been talking
about being able to automate some parts
of these workflows and now we actually
see, you know, more than the most
advanced companies trying to do that.
And I think that's, that's a great
place for us to land as an industry,
I think, over the next few years.
Daniel Reid Cahn: I hear that.
I was, I was talking to my
father in law earlier today.
He runs an IT company in the Bay Area.
And he was telling me, like,
we're finally adopting AI for,
like, our ticket understanding.
A lot of our, you know, agents get
these, like, support tickets and
they have to read through the whole
thing and they miss the context.
And click one button.
And AI can, like, read it, understand
it, make it so much faster, customers
get support faster, our team is happier.
And I'm like, yeah, that's nice, but, you
know, what do you really want from AI?
And he was like, I want AI
that could solve the ticket,
Dev Rishi: you know?
Yeah, exactly.
And do you think that that's
too narrow or non creative, or
how do you think about that?
I
Daniel Reid Cahn: don't know.
No, I mean, I think, like,
he's, he's very pragmatic.
Yeah.
Like obviously similar to like what you're
describing, would he want, you know,
preta base to be able to actually help
him host models that solve his use case?
Like of course he wants to solve
his problems now he has problems
now that AI can solve now.
Yeah.
But it also seems like if AI right
now can go read through a boring
ticket and understand it enough
to get a human to move from taking
10 minutes to solve it to five.
Yeah.
And then, you know, very soon
it moves from 10 to zero.
You know, there does seem like.
You know, the exciting thing to me, I have
to at least be excited from a sci fi point
of view about this leapfrogging about the
point at which this ticket, you know, the
person emails in and they're like, I'm
having trouble with signing into zoom.
And then the guy says, like, Oh, yeah,
here are three things you can try.
And like GPT four isn't there yet.
Like he did experiment with like,
what if GPT four wrote the answer?
It's just not there yet, right?
Yeah, team is smarter.
But he also knows, you know, for a
lot of those tickets, We can actually
fully automate them just with
technology that's not quite here yet.
That's kind of where I'm, you know.
Dev Rishi: I think the place that I come
from is like I remember getting into,
like, democratizing AI in 2016, 2017.
And I think that AI has
never been underhyped.
Like, people talk about the hype
cycle for AI in 2020 in 2023 2022.
We started Protobase in 21 and I
remember thinking, man, we're in
machine learning, this is like one of
the hottest spaces, like, in the world.
And I remember thinking that when I
was at Kaggle Kaggle in 2019 and 2018.
It's never been underhyped.
I would say it's consistently
underdelivered in economic value outside
of the top percentage of companies.
And so if the way we get there is
we solve some of these narrow use
cases, I would love that a lot more
than we kind of build the next series
of kind of sci fi esque use cases.
But, you know, every comp like the,
you know, the construction companies
are still consistently going through
every manual invoice themselves.
Like sci
Daniel Reid Cahn: fi guy?
Just curious.
Dev Rishi: I do like sci fi.
Okay.
Yeah, yeah, I do like sci fi.
Daniel Reid Cahn: I I mean,
I Look, I think there's,
Dev Rishi: I like sci fi, but I've
spent so much time with the customers
that are like, look, AI seems
great, but here's my, like, here's
the problem that I actually have.
And it's the same things that we
haven't solved in 2016, 2018, 2020.
And now I see the solution like here.
Daniel Reid Cahn: Although I don't
know, I, I go back and forth.
Cause I do get it.
I do get that.
Like, I think we are going to go
through some sort of market crash,
some bubble popping, some, you know,
investors saying, show me the money.
What's going on.
Yeah.
On the other hand, you look at these, AI
hype cycles, like we're talking about,
and you're like, it's not really a cycle.
It's just been hyped.
But the truth is, like, it was
pretty hyped during the big data era.
Yeah.
Then it got more exciting during
the ML era, and then it got more
exciting when Transformers came
out, and then it got more exciting
when GPT came, GPT Four came out.
Right.
It's not like we've gone through some
hype cycle where it was exciting and then
not, and then more exciting and then not.
It seems like it's only
gotten higher and higher.
And the reason why is because I think
like we moved from a place where
like AI can automate back office
tasks to now AI doctors, like, I
also wonder just economically, like
if we actually delivered on all the
back office tasks, that be nearly
enough to account for the investment
and the hype and the excitement?
Or do we really need AI doctors
for, you know, That's a good
Dev Rishi: question.
I feel like we should put a top
three consulting or accounting
firm on that to figure out like
what the costs directly are.
The nice thing is once you solve with AI,
it's kind of like recurring value, right?
And so like whatever you solved in a
year, like you basically get the same.
It's like the beautiful thing
about Sass in some ways.
So I imagine over some horizon
of time, the answer is probably
yes, but it'd be interesting to
compare that against investment.
I mean, I will say like, I'm very
excited about the really cool end to
end multimodal models that we have.
I've seen these end to end
physical world models, too.
Like, I think there's some amazing
areas where AI is gonna go.
Daniel Reid Cahn: Self
driving cars, by the way.
Where are all the self driving cars?
I just
Dev Rishi: think that, like, I actually
think that the ecosystem is not currently
limited in the imagination on that.
Daniel Reid Cahn: Okay.
Dev Rishi: And so I think,
like, the funding exists for
those types of environments.
I think that the progress is happening.
Maybe it could happen faster.
I think the progress is happening
kind of in a startup landscape there.
And so I'm not so concerned that, you
know, the Fortune 500s of the world
are probably starting to think a lot
more about back office task automation.
Because I think by the time that
they figure that out, I'd love
for some of these physical world
models to then be like, Okay, and
here's how we can actually help you.
Like, end to end physical world models?
Incredible.
Like, I think, you know, what,
incredible in like the line of thinking.
Do I think there'll be something
someone could use there in six months?
No, and I don't want AI to be
a disappointment just because,
you know, they haven't they
haven't quite gotten there yet.
Daniel Reid Cahn: Yeah.
I mean, I also think like, like the
reason why I'm excited about Predabase
is because I think there are a lot of sci
fi models that we can actually deliver
on, hopefully on Predabase, meaning
like that we can get to that point of
like, you know, not just we were able
to do some like tag identification,
but you know, there are probably a
lot of really high impact things that
can be delivered with fine tuning.
Totally.
I think You know, I do think there has
been some lack of creativity, personally,
I think there has been some, like, I hear
way too often when I talk to people about
AI, you know, this, this idea, like you
said, about the 98 percent of compute of
GPT Four requests being on the base model.
Most people I talk to assume,
yeah, that AI is just base models.
I hear a ton from AI engineers about,
you know, I think rag has sort of fallen
a bit, but for a while that was like,
rag, rag, rag, don't ever fine tune.
And I'm wondering, like, is it
just because fine tuning is hard?
You know, were people not
targeting hard enough use cases
to bother with fine tuning?
Would Predabase make more money if
people tried more hard shit on Predabase?
Dev Rishi: I actually just think that
people think fine tuning is hard.
I don't think it actually is hard.
I think it's hard because the data
Daniel Reid Cahn: is what I mean to say.
The data is, yeah.
Like, if I said, I can call GPT Four to
anonymize some text versus I can fine
tune a model to anonymize some text.
Calling GPT Four is so easy.
It is so insanely easy.
For the latter, maybe I have to talk
to Gretel, create some synthetic data,
train a llama model on Predabase.
Like, that could take me,
real effort and thought.
Dev Rishi: For sure, but I
think that's actually why no one
starts off fine tuning a model.
They always start off like with
GPT Four, you know, or GPT 3.
5, and then the base open source
model, and then fine tuning is kind of
next in that progression life cycle.
And I am actively thinking about
ways we can move fine tuning up
closer to the beginning or at least
something there like that data prep
requirement becomes less painful
for the user, kind of on their side.
But I think the fact that
this progression exists.
is it's like very logical.
And to me, like some of the
limitation in thinking is just.
We have like this cascade of people that
are going through these steps still and
I feel like we've just started to see
the fine tuning wave Starting right now.
Like it's like the very early
days of like where people are like
fine tuning actually makes sense
Daniel Reid Cahn: Yeah,
Dev Rishi: and like I think I think
like launches like, you know, like the
Databricks paper on like Laura fine tuning
and it's just full fine tuning I think
like Laurel and I think all these things
help advance that ecosystem Also just
Daniel Reid Cahn: like this
shout out to Harvey fine tuning
10 billion tokens for legals.
Yes their whole point
is about fine tuning.
They are fine tuning.
Everyone likes to make these distinctions.
That is fine tuning.
What they're doing is fine tuning and
that's phenomenal because they're trying
to take on an insanely hard use case
by getting 10 billion tokens of data.
That's not easy.
Exactly.
I feel like again, not the training part.
I think training, we could debate
how hard it is, but the point is the
data is definitely the hard part.
Dev Rishi: Yeah.
I feel like a year and a half, the
thing was like pre training and that
wave actually didn't really make a
lot of sense to me because it's like,
the vast majority of organizations
probably don't need to pre train.
Some definitely do, the vast majority
definitely know, but fine tuning I think
makes all the sense in the world to me.
I feel like we're in the very early
like eight eras of that and so it's
not like it's, I think we're gonna
see a lot more creativity in terms
of the ideas that got brought on.
The direction I'm most interested in
the sci fi like creativity is right
now people still think about fine
tuning as like one per use case style.
Like I'm going to fine
tune a model to do this.
What I'm really interested in is like we
have many diff like we have the ability
to fine tune many models very easily,
very cheaply, create these adapters,
and now with Lorax you can serve
many of these adapters very cheaply.
So what does it look like if you
didn't have like one fine tune,
but you had a hundred fine tunes,
or four hundred fine tunes, that
all do slightly different things?
maybe, but can be served just as
easily as a single base model.
And then what you really have is, you
know what, I think some of these mixture
of expert models have exploited, which
is the ability to understand what part
of a model architecture you actually want
to be able to use to answer a question.
Daniel Reid Cahn: So can you
imagine for me, like even if it
were 10, like how would you use?
10 Lauras for a use case.
Dev Rishi: Yeah.
So I think that there's two ways that
I can think about it, but let's just
imagine that you were in that customer.
Like, let's imagine you just wanted
a customer service bot, right?
Like customer service is actually
many different tasks, right?
There's like, hey, what is the
triage level and priority for this?
What is the routing area
that it should go to?
How do I write a response
back to this user?
You know if they want to cancel their
subscription, how do I, it's like a
lot of these different types of tasks.
Right now I think that your two options
are you like trying to fine tune models
per task and then like deploy all those
and figure out like a orchestration
for them all that lives in business
logic, which is like, hey, first do
this step and do this step or that would
Daniel Reid Cahn: be like
first class by the email.
Okay, it looks like a cancellation email.
Yeah, run the cancellation bot
and then run the response bot.
Dev Rishi: Yeah, yeah.
And then like, obviously, there's
like maybe 16 other steps that go.
Daniel Reid Cahn: And then the problem
is that the person says like, Okay.
Cancel my order, but only if it's
not able to arrive by Labor Day.
'cause I'm not in town
by Labor on Labor Day.
Exactly.
But if you're able to tell me that
it's gonna be the day before Labor
Day, then I can tell my neighbor to
come by the house and pick it up.
Yeah.
Yeah.
And they're just like,
oh shit, what do we do?
And then
Dev Rishi: they're gonna have, and
then they're gonna say something
like, oh, by the way, like what was
the what, what delivery mechanism
is gonna, is it gonna be like FedEx
or it gonna be dropped off my door?
Like they have all these like
interper questions, right.
And so I think solving
a use case like that.
It weirdly reminds me of building
old school assistants where like,
I worked at the Google Assistant
for a little while too, right?
And the way you used to do it would be
basically you map everything to an intent.
You map intents to like
fulfillment logic and slot filling.
And it kind of reminds me of that.
You have to like build a deterministic
logic of like fine tuned model X to
this, fine tuned model Y to this.
So that
Daniel Reid Cahn: was like someone says,
turn the light off, turn the lights off.
And then it's like, okay, this
person's trying to like change
the status of the lights.
Which way?
Oh, off.
Off is one of the on off options.
Yeah.
Dev Rishi: That's the way we built a
lot of AI systems historically, right?
Conversational AI, in particular.
But I think what would be really
interesting is if you had specialized
adapters that were well trained for
all of these different tasks, served
off top of a single base model, and
kind of a router logic, like a routing
layer, that understood for a given
question that the user is trying to
do, which specialized model should
I go ahead and hand this off to?
So
Daniel Reid Cahn: I'd love to
dive deeper there, I don't, I
know we're running out of time.
I want to ask just some last questions.
Curious, first AGI.
Bye.
Do you have plans for AGI?
Dev Rishi: Do I have plans for AGI?
Retire early, I think?
I'm not sure actually, so With AGI At
Predabase, we build like infrastructure
and tooling, so I would love for people
to get closer and closer towards AGI
building fine tuned LLMs, maybe using this
routing approach that I was talking about.
I feel like what I'm still missing is a
really good, crisp definition for AGI.
If it's just the Turing test,
I feel like we're probably, you
know, around that right now.
Daniel Reid Cahn: Contentious issue.
Contentious podcast on, are we,
Like, how close are we to passing it?
Well, is the Turing test outdated?
But I, I,
Dev Rishi: Whether it's outdated or not,
I wonder if, like, I think The Turing
test especially is like a measure maybe
of like, you know, I would say like,
it's some bar around intelligence.
Daniel Reid Cahn: Sure.
Dev Rishi: I feel like we're
hovering around it, whether we've
passed it, whether we're not.
No one talks to GPT Four and says,
This is, you know, the furthest
thing I've ever heard from a human.
Like, there is some kind of criteria
where they're close, but Let
Daniel Reid Cahn: me just, just
for your sake for now, let's stick
with one definition, which is like,
work that can be done by a human
remotely, can be done by an AI.
Dev Rishi: Yeah, I think we're
very, I think in some, like,
it depends on the work, but I
think we're very close to that.
Can your job be done
Daniel Reid Cahn: remotely?
Dev Rishi: No, but I wish.
Daniel Reid Cahn: I don't know, I, I,
I don't know, I, I think this is like
my, my father in law, you know, his type
company that, you know, he'd be, you know,
He'd be very much affected, but I, like,
the other way to look at it would be the
point at which humans become useless.
Like, where you're like, hey, I would
love to solve world hunger and then you
have this AI, like, that's cute, you know?
Right.
Like, leave it to me, because
I'm smarter and faster, and
Dev Rishi: I think there's, like, the
way that the debate usually breaks up
is, like, one, that would be terrible and
there'd be mass unemployment in others,
and then the second being, humans will
find the next series of things that they
want to go ahead and spend time doing.
Similar to, like, Industrial Revolution,
I'm no longer spending time on
agriculture, what do I spend time on?
Daniel Reid Cahn: Yeah.
Dev Rishi: And I probably put myself a
little bit more in the latter bucket,
in terms of like, I think that if we
end up like getting to the area where
AGI establishes some base level of
productivity for the overall economy,
then I think people will choose
different ways to spend their time.
Some of them will be productive and
expand the Pareto Frontier, some
of them will be for leisure, and I
think both of those are good things.
Daniel Reid Cahn: But you're
gonna go for the latter.
You're gonna retire?
Dev Rishi: We'll see.
We'll see how everything goes with,
like, the next with this with Predabase.
But you know, I think that I
would love to be able to have a I
Daniel Reid Cahn: heard Sam Altman
say at some point, I don't know if
he was being facetious, but he was
like, I'll just have a lot of kids.
Dev Rishi: If that was the flower
case, I mean, I think if I got to the
point where like AGI was able to do,
like, let's say 70 percent of my role.
Daniel Reid Cahn: Yeah.
Dev Rishi: I don't think I'd backfill
70 percent of it would just work.
I think I'd like mix in some
work life balance in there.
More work life balance
than I'd say I have today.
Daniel Reid Cahn: Very nice.
And then just in terms of like
resources for keeping up with AI,
it sounds like you have some, if you
had like an ML engineer looking for,
Stuff to check out, things to read.
Dev Rishi: Yeah I mean, like,
there's some obvious things.
Like, I think Andrew Wing's been putting
together some great, like, short form
courses on YouTube with deep learning AI.
We did one on efficient serving for LLMs.
We have a fine tuned LLM newsletter.
And I hate to say it, but I
actually think I get a lot of
my LLM news on Twitter still.
Daniel Reid Cahn: And
Dev Rishi: it's, like, weirdly
actually something where I, like,
feel like I see, like, probably
everyone I follow right now is an ML
influencer of some sort or the other.
But The space is moving so quickly,
I feel like things just get, like,
organically shared on social, so.
Other newspapers I read newsletters that
include The Sequence and a few others
but, I would say places to start would
probably be, check out our fine tuned
newsletter like, on top of Predabase.
Great name.
Yeah, exactly.
We have shirts that say the future is fine
tuned, so we just called our newsletter.
I
Daniel Reid Cahn: love that.
Yeah.
That is such a great
idea to get a t shirt.
We'll get
Dev Rishi: you a shirt.
Yeah, we'll make sure that you have one.
Daniel Reid Cahn: Alright,
well this was awesome, Dev.
Thanks so much for joining us.
Dev Rishi: Yeah, you got it, Daniel.
Thanks for having me.
Daniel Reid Cahn: Awesome.