SkySyrup @ SkySyrup @sh.itjust.works

Posts

10
Comments

159
Joined

2 yr. ago

Sure! You’ll probably want to look at train-text-from-scratch in the llama.cpp project, it runs on pure CPU. The (admittedly little docs) should help, otherwise ChatGPT is a good help if you show it the code. NanoGPT is fine too.

For dataset, maybe you could train on French Wikipedia, or scrape from a French story site or fan fiction or whatever. Wikipedia is probably easiest, since they provide downloadable offline versions that are only a couple gigs.

2y ago

Rule

Jump

yes please

2y ago

Feature request - hide blocked accounts from comment total

Jump

A simple way would be to load the comments themselves, and then check for blocked users. But this would basically ddos the instance servers and would be extremely janky lol

or I’m being me and missing something obvious :/

2y ago

Voyager 1.31.0: Change comment account, happy holidays!

Jump

Thank you so much! Have a good break!

2y ago

Firefox is on the brink of being dropped by the US Government

Jump

kid called EU anticompetitive laws:

linuxmemes @lemmy.world

SkySyrup @sh.itjust.works

2y ago

usb formatting

2y ago

Asking ChatGPT to Repeat Words ‘Forever’ Is Now a Terms of Service Violation

Jump

The technology of compression a diffusion model would have to achieve to realistically (not too lossily) store “the training data” would be more valuable than the entirety of the machine learning field right now.

They do not “compress” images.

2y ago

Wake up, it's time for rule

Jump

can I interest you in eepy mode Tuesday?

2y ago

The first image I've generated on my own PC! My favourite animals, of course. [Fooocus/Stable Diffusion XL]

Jump

Are you using SDXL? If you are, you need to set the resolution to 1024x1024

2y ago

Google Researchers’ Attack Prompts ChatGPT to Reveal Its Training Data

Jump

I dunno. Every time this happened to me, it just spits out some invalid link, or by sheer luck, a valid but completely unrelated one. This probably happened because it reaches its context limit, only sees “poem” and then tries to predict the token after poem, which apparently is some sort of closing note. What I’m trying to argue is that this is just sheer chance, I mean you can only have so many altercations of text.