Prunebutt

6mo ago

About as open source as a binary blob without the training data

Jump

Great, so we agree. ᕕ(ᐛ)ᕗ

6mo ago

About as open source as a binary blob without the training data

Jump

Creative commons and MIT licence are distinct, though.

6mo ago

About as open source as a binary blob without the training data

Jump

It's constantly referred to as "open source".

6mo ago

About as open source as a binary blob without the training data

Jump

You don°t have access to the source.

6mo ago

About as open source as a binary blob without the training data

Jump

So the models aren't opn source 🙄

6mo ago

About as open source as a binary blob without the training data

Jump

Tutorials won't disclose the data used to train the model.

6mo ago

About as open source as a binary blob without the training data

Jump

everything is open source, with the exception of the data

If I distribute a set consisting of emulator and a Rom of a closed source game (without the sourcecode), then the full set is not open source.

So if deep seek removed its data set, would you then consider deepseek open source?

Kind of, but that's like expecting a console without any firmware. The Weights are the important bit of an LLM distribution.

6mo ago

About as open source as a binary blob without the training data

Jump

So an emulator can’t be open source if the methodology on how the developers discovered how to read Nintendo ROM’s was discovered?

No. The emulator is open source if it supplies the way on hou to get the binary in the end. I don't know how else to explain it to you: No LLM is open source.

6mo ago

About as open source as a binary blob without the training data

Jump

https://slrpnk.net/comment/13455788

Edit: this one is a more thorough explanation: https://lemmy.ml/comment/16365208

6mo ago

About as open source as a binary blob without the training data

Jump

it’s the entirety of the bulk unfiltered data you want

Or more realistically: a description of how you could source the data.

doesnt touch on at all how this LLM is different from other LLM’s?

Correct. Llama isn't open source, either.

like saying that an open source game emulator can’t be open source because Nintendo games are encapsulated

Not at all. It's like claiming an emulator is open source, because it has a plugin system, but you need a closed source build dependency that the developer doesn't disclose to the puplic.

6mo ago

About as open source as a binary blob without the training data

Jump

It's still not open source. No matter how extendable the weights are.

6mo ago

About as open source as a binary blob without the training data

Jump

So, Ocarina of Time is considered open source now, since it's been decompiled by the community, or what?

Community effort and the ability to build on top of stuff doesn't make anything open source.

Also: initial training data is important.

6mo ago

About as open source as a binary blob without the training data

Jump

The point of open source is access to reproducability the weights are the end products (like a binary blob), you need to supply a way on how the end product is created to be open source.

6mo ago

About as open source as a binary blob without the training data

Jump

They published the source code needed run the model.

Yeah, but not to train it

anyone can download the model, run it locally, and further build on it.

Yeah, it's about as open source as binary blobs.

Training from scratch costs millions.

So what? You still can gleam something if you know the dataset on which the model has been trained.

If software is hard to compile, can you keep the source code closed and still call software "open source"?

6mo ago

About as open source as a binary blob without the training data

Jump

If we're specifically saying that open source means you can recreate the binaries, then data is fundamentally not able to be open source

lol, are you claiming data isn't reproducable? XD

6mo ago

About as open source as a binary blob without the training data

Jump

You could train it yourself too.

How, without information on the dataset and the training code?

6mo ago

About as open source as a binary blob without the training data

Jump

Seems kinda reductive about what makes it different from most other LLM’s

The other LLMs aren't open source, either.

isn’t that just trained from the other AI?

Most certainly not. If it were, it wouldn't output coherent text, since LLM output degenerates if you human-centipede its' outputs.

And the way it uses that data, afaik, is open and editable, and the license to use it is open.

From that standpoint, every binary blob should be considered "open source", since the machine instructions are readable in RAM.

6mo ago

About as open source as a binary blob without the training data

Jump

Let's transfer your bullshirt take to the kernel, shall we?

The kernel is instructions, not code. It’s perfectly fine to call it open source even though you don’t have the code to reproduce the kernel from scratch. You are allowed to modify and distribute said modifications so it’s functionally free (as in freedom) anyway.

🤡

Edit: It's more that so-called "AI" stakeholders want to launder it's reputation with the "open source" label.

6mo ago

About as open source as a binary blob without the training data

Jump

So, where's the source, then?

6mo ago

About as open source as a binary blob without the training data

Jump

Open source means you can recreate the binaries yourself. Neiter Facebook. Nor the devs of deepseek published which training data they used, nor their training algorithm.