is the 4k context length of llama2 for real?
noneabove1182 @ noneabove1182 @sh.itjust.works Posts 89Comments 149Joined 2 yr. ago
noneabove1182 @ noneabove1182 @sh.itjust.works
Posts
89
Comments
149
Joined
2 yr. ago
You raise an interesting point though in that most examples likely follow exactly as you suggest, there would have to be large amounts of training specifically for focusing on middle content, there probably just isn't enough in the dataset