Skip Navigation

User banner
Posts
13
Comments
262
Joined
2 yr. ago

  • yep, another big outage.

  • ok, I did know about that, just didn't memorize the name. I'm assuming only private messages and user account info (email address) are the real concern in terms of exposure? It's mostly a public posting thing, or not?

  • The quantity of users on Lemmy I still consider to be pretty low, the performance bugs need to be addressed on a big server. Bugs like not having a WHERE clause on an UPDATE hitting 1500 rows in a table (one row per server) instead of 1 single row... these need to be shaken out.

    The errors of the overload themselves have been a way to throttle growth of the big servers. People were not able to insert new posts and comments into Lemmy.ml - reducing outbound federation activity too, and they went to other servers. This went on all of June and July.

  • subscribe to every community, and let federation load overwhelm your server.

    Did that, takes lots of time to wait for the content to come in.... and there is no backfill. Plus I suspect that the oldest servers (online for several years) have some migration/upgrade related data that isn't being accounted for.

  • specific API version at a time so there’s some limitations right now as instances upgrade from 0.18.2 to 0.18.3

    What API-breaking changes did you find?

  • You’re thinking about it the wrong way.

    I've had to go through a major change in thinking and adjust my interpretation in major ways.

  • There has also been problems with federated copies of communities not getting all the actions. I added testing code to demonstrate that comment deletes were not going out to federation peers. Comparing copies of data between instances for the same community shows some overlooked problems. Still have more tests to add.

  • A big part of the problem is that a new instance starts with zero database content and PostgreSQL performs fine with the way Lemmy organizes the data. But then there isn't anything for people to read, and search is only going to pick up local stuff.

  • Things have been incredibly unstable there.

    I wish lemmy.ml (also unstable) or lemmy.world would hand out a (nearly) full copy of the database so we can get more analysis done on PostgreSQL performance behaviors. Remove the private comments and password /2fa/user, or whitelist only comments/posts/communities/person tables - but most everything else should already be public information that's shared via the API or federation anyway. it's the quantity, grouping, and the age of the data that's hard to reproduce in testing. And knowledge of other federated servers, even data that may have been generated by older versions of Lemmy that new versions can't reproduce.

    It's been over 60 days of constant PostgreSQL overload problems and last week Lemmy.ca made a clone of their database to study offline with AUTO_EXPLAIN which surfaced a major overload on new comments and posts related to site_aggregates counting (it was counting each new post/comment against every known server, not just the single database row for a server).

    I have an account over on World too, and every major Lemmy server I use throws errors with casual usage. It's been discouraging, I haven't visited a website with this many errors in years. Today (Sunday) has actually been better than yesterday, but I do not see many new postings being created on lemmy.ml today.

  • Sunday, lemmy.ml is performing better than I have seen it in 60 days. I did get errors on Saturday and significant lag. Not sure if activity is just today or what, but it's really been fast in routine browsing.

  • I've had to really adjust my thinking with this project. They want to do things a very particular way and it goes back 4 years, and a lot of the mistakes are just now getting noticed/attention. For example, comments were not deleting on all the servers, I was testing that after comparing server copies of the same communities and found they were not the same. It just didn't seem to have a lot of people spot-checking it for mistakes. I am learn to just "go with the flow" and face that it's more like how musicians would approach design and running a project. Media-focused systems can be that way.

  • I think 0.18.3 fixed some of it, but there are likely some more performance issues related to PostgreSQL lurking in Lemmy.

    A TRIGGER in SQL is a logic that executes based on other activity.

    Lemmy uses them so that when you create a new comment or post, it executes code to insert tracking record for votes and comments on a post. One of the things Lemmy does is called site_aggregates, and there was a bug where it was updating the counts for 1500 servers instead of just the one server. That got fixed in 0.18.3

    Deleting accounts in lemmy was causes crashes. I'm not sure if that has been entirely resolved. These things are all kind of hidden in the background of the code, so a lot of developers overlooked that there were problems in them.

  • NASA statement this week opened my mind up to non-human intelligence on Earth. I started looking into when dinosaurs were really taken seriously, and it wasn't until 1800. Think about how many humans ignored that evidence. Maybe some intelligence evolved on Earth and had brains that found natural physics and chemistry more teachable than our learning in school classrooms, found ways to open dimensions and just left Earth for some place better, but still comes back to check us lower species, ha.

  • Recorded media, electronic media, is something the founding fathers never had to deal with.

  • a lot of db tuning was being avoided

    and I did not understand or properly relate to that project culture. It had been that way for years and I should have "read the room" "go with the flow".

  • This has nothing to do with rockstar culture

    Then I'm confused, because that was my own idea.

    the fact that you’re spending 10x the amount of typing complaining about an issue

    I'm no longer complaining, you convinced me, I love them like Rock Stars now and I have formally apologized and explained how wrong I was in my thinking because of my past memories of running mission-critical PostgreSQL servers. Are we clear now? It's all about Style and Fashion, and I got way too worked up about crashes.

    So either you don’t want it fixed because you prefer to complain and die on your sword, or you don’t know how to fix it.

    I don't get this. Why are you making it about me? Do you think I am the one who opened GitHub issue 2910? Is that your accusation? That I created a fake account and opened issue 2910? I was not worried about me. Even in June I was not worried about me personally. I was worried about the person who opened the issue, is that understandable to you?

    I was worried about Reddit users encountering server crashes. This isn't about one person, me. This is about thousands of people and a June 30 deadline.

    But 12 hours ago, I have turned direction. I did not realize just the kind of culture and "Rock Star" attitude that was going. I was focused on Reddit June 30, and I didn't see that the social conventions were far more important than server crashes. It was a mistake for me to be worried so much about data and crashes when that isn't the culture here. I am finding everything thinks it is "cool" and "fine" that it took over a month for 2910 to be resolved. I never expected that, it was me who was socially out of touch.

    I really got lost socially and regret my attitude problem. I should have learned back in March with Elon Musk running Twitter now, that the rules for social media cultures are vastly different than my measures for what would consider to be "cool" regarding a server crash issue. Not one person has said that 2910 should have been addressed within 3 or 4 days of being created. So I know now that it is me who has to change.

    If there is actually an issue I expect someone else who is actually levelheaded and reasonable will identify it and submit a PR.

    Do you think the issue isn't fixed or something? This is a postmortem discussion. You seem confused. Or do you think some other confusion, like I'ts about me personally in Issue 2910?

  • Then go fix it and open a PR

    Do you think I am the one who created the mistake or something? That I have access to the servers to install it?

    It's so odd to me that you respond this way, as if it was my coding mistake. It isn't even me who opened issue, that is GitHub "makotech222" - is that your answer to them?

  • If the problem is easy to solve, then go solve it, open a PR, and come back here once you’ve done so.

    Why... That isn't going to get in installed on the servers they are running. I failed to see that this is a "Rock Star" culture, and the audience does not interpret months of Issue 2910 getting no attention as a problem. There are social forces that are non-technical, and I wildly misinterpreted the situation. You personally have really made the case to me just how wrong I am. Again, I am sorry I made such a fuss and misunderstood.

    Be like Phiresky, actually put your code where your mouth is.

    Why... That isn't going to get in installed on the servers they are running. I know the change was not difficult for anyone to do. I failed to see that this is a "Rock Star" culture. Look at how you know them by names, and how much you respect that. I just didn't appreciate the 4 years of style and fashion so fully.

    Lastly, I don’t know if you were aware of this, but the Lemmy devs don’t owe you anything.

    Such an interesting discussion. Do you believe Reddit owes you something? Do you believe Linux owes you something? Such a interesting topic.