Everything. It does some re-encoding when it retrieves content from other instances and you can set limits for pictrs (the software Lemmy uses to host media) regarding file sizes etc.
Edit: I was partially wrong about what is cached, see my original comment
This will differ greatly from instance to instance. The people running lemmy.world have published some info on their infrastructure. My instance is running on a rather small VPS with 100GB storage, but I will have to rethink my solution rather soon as images and videos from my subbed communities [Edit: which are stored on outside sites] are eating around a gigabyte per day and I think this is likely to increase.
Edit: I want to clarify that I was partially wrong - Lemmy only locally caches content which is hosted on outside sites (e.g. imgur). It does not cache content that was directly uploaded to another Lemmy instance and just embeds the source media.
That is not true. As long as a user on your instance is subscribed to a community, the media content of posts [Edit: only posts linking to outside sources, e.g imgur] of that community is stored locally on your instance as well.
This, of course, only applies to media which is uploaded to Lemmy, links to media hosted externally are not downloaded.
Edit: I want to clarify that I was partially wrong - Lemmy only locally caches content which is hosted on outside sites. It does (should?) not cache content that was directly uploaded to a Lemmy instance and just embeds the source media.
Apart from the few gigs of really private and self-made data, most of it would probably be replacable, it's just a matter of how much work that would be. On the other hand, I wonder how much of my media collection I would actually miss were it to get lost.
Everything. It does some re-encoding when it retrieves content from other instances and you can set limits for pictrs (the software Lemmy uses to host media) regarding file sizes etc.
Edit: I was partially wrong about what is cached, see my original comment