egeres

I can't believe I'll get excited about phone specs again 🙌🏻✨. Do you think it could be possible to parallelize computation among various phones to run inference on transformer models? I assume is not worth it since you would need to transfer a ton of data among devices to run attention per layer, but the llama people have pulled so many tricks at this point...

egeres @ egeres @lemmy.ml Posts 1Comments 1Joined 2 yr. ago

egeres @ egeres @lemmy.ml

Posts

1
Comments

1
Joined

2 yr. ago