Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)RO
Posts
0
Comments
146
Joined
2 yr. ago

  • Training data for these models used to be text off of the internet and some manually generated Q&A examples to make it behave more like a chat bot (instruction tuning). Because there is still a need for more data they have started adding AI generated text to the dataset. This technique doesn't add new knowledge but it has shown to reduce hallucinations. Likely because this data is more focussed, truthful and structured than the median text from the existing datasets. They would probably have data from every major chat provider in there, especially the big boys.

  • The size of the updates and also the size of the game itself might be due to how it is packaged. You want data that belongs together and is accessed together to be stored together. For example, the game might have one file per level that is loaded and kept in memory when you enter that level. You might even store the same asset multiple times if that means it's easier to access sequentially. This optimization is less necessary in the are of ssds but you don't want your game to be completely unplayable on people that still run it from a hard drive.

  • ruh roh

    Jump
  • A privately owned platform cannot serve the public good. There will always be conflicts of interest. A proper public square should be funded by a competent government (but those are rare) or decentralized.