Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)PR
Posts
10
Comments
1,314
Joined
2 yr. ago

  • everything is open source, with the exception of the data

    If I distribute a set consisting of emulator and a Rom of a closed source game (without the sourcecode), then the full set is not open source.

    So if deep seek removed its data set, would you then consider deepseek open source?

    Kind of, but that's like expecting a console without any firmware. The Weights are the important bit of an LLM distribution.

  • So an emulator can’t be open source if the methodology on how the developers discovered how to read Nintendo ROM’s was discovered?

    No. The emulator is open source if it supplies the way on hou to get the binary in the end. I don't know how else to explain it to you: No LLM is open source.

  • it’s the entirety of the bulk unfiltered data you want

    Or more realistically: a description of how you could source the data.

    doesnt touch on at all how this LLM is different from other LLM’s?

    Correct. Llama isn't open source, either.

    like saying that an open source game emulator can’t be open source because Nintendo games are encapsulated

    Not at all. It's like claiming an emulator is open source, because it has a plugin system, but you need a closed source build dependency that the developer doesn't disclose to the puplic.

  • So, Ocarina of Time is considered open source now, since it's been decompiled by the community, or what?

    Community effort and the ability to build on top of stuff doesn't make anything open source.

    Also: initial training data is important.

  • They published the source code needed run the model.

    Yeah, but not to train it

    anyone can download the model, run it locally, and further build on it.

    Yeah, it's about as open source as binary blobs.

    Training from scratch costs millions.

    So what? You still can gleam something if you know the dataset on which the model has been trained.

    If software is hard to compile, can you keep the source code closed and still call software "open source"?

  • Seems kinda reductive about what makes it different from most other LLM’s

    The other LLMs aren't open source, either.

    isn’t that just trained from the other AI?

    Most certainly not. If it were, it wouldn't output coherent text, since LLM output degenerates if you human-centipede its' outputs.

    And the way it uses that data, afaik, is open and editable, and the license to use it is open.

    From that standpoint, every binary blob should be considered "open source", since the machine instructions are readable in RAM.

  • Let's transfer your bullshirt take to the kernel, shall we?

    The kernel is instructions, not code. It’s perfectly fine to call it open source even though you don’t have the code to reproduce the kernel from scratch. You are allowed to modify and distribute said modifications so it’s functionally free (as in freedom) anyway.

    🤡

    Edit: It's more that so-called "AI" stakeholders want to launder it's reputation with the "open source" label.