Retentive Network: A Successor to Transformer for Large Language Models
noneabove1182 @ noneabove1182 @sh.itjust.works Posts 89Comments 149Joined 2 yr. ago
noneabove1182 @ noneabove1182 @sh.itjust.works
Posts
89
Comments
149
Joined
2 yr. ago
yup especially with all these quadratic scalings, we need to break away from it