Skip Navigation

Posts
8
Comments
253
Joined
2 yr. ago

  • I am more interested in being able to observe metrics for each node individually rather than in aggregate.

    This requirement makes me think netdata would be a good solution. In my current setup, each host has its own netdata dashboard and manages its own health checks/alarms. I have also enabled streaming which sends metrics from all hosts to a "parent/master" netdata instance from which I can see all metrics from all hosts without checking each dashboard individually.

    However, it looks like it does not store the metrics for very long.

    I still have to look into this, in the past it was certainly true and you had to setup a prometheus instance to store (and downsample, who needs few-seconds resolution for one year old metrics) metrics for long-term archival - but looking at the documentation right now, it looks possible to store long-term metrics in the netdata DB itself, by moving old metrics to a lower-definition storage tier: https://learn.netdata.cloud/docs/configuring/optimizing-metrics-database/change-how-long-netdata-stores-metrics

    An important additional advantage is that it comes packaged on Debian (all my machines run Debian).

    Same. However I install and update it from their third-party APT repository - it's one of the rare cases where I prefer upstream releases to Debian stable packages, the last few upstream releases have been really nice (for example I'm not sure the new tiered retention system is availabel in the v1.37.1 Debian stable package)

    My automated installation procedure (ansible role) is here if you're interested (start at tasks/main.yml and follow the import_tasks).

  • you have to pay a subscription for the option to disable individual alert types

    Never heard of that. You can disable individual alarms by setting to: silent in the relevant health.d configuration file (e.g. health.d/disks.conf). This is exactly what I do for packet drops.

  • I serve HTTP 403 for all requests to the default vhost and log them, harvest IPs through a log aggregator (or just fail2ban) and tag them as bad bots/scanners, and eternal-ban them on all my hosts. Currently have 98451 addresses or networks in my ipset for these.

    For requests to actual domains, I ban after a few unsuccessful authentication attempts. A WAF is nice to have (tedious but fun to set up) - currently working on improving my Modsecurity setup.

    Other than that there is already good advice here:

    • keep OS/packages/installed services up-to-date
    • only run software from trusted (ideally signed) sources
    • use host and network-based firewalls
    • use strong encryption and authentication everywhere
    • only expose what is absolutely required
    • implement good privilege separation (even dedicated users for each app/service, proper file ownership/permissions goes a long way)
    • run scanners to detect possible misconfigurations/hardening measures (systemd-analyze security was mentioned, I also like lynis and debsecan)
    • set up proper logging/monitoring alerting
  • Been running multiple Nextcloud instances for years on bog standard debian + apache + php-fpm install, as documented in the official docs which do not even mention docker. Upgrades were never a problem. Some apps may suffer some bugs from time to time, but Nextcloud itself works flawlessly. Wrote an ansible role to install, manage and update it. The only thing that deviates from the "recommended" setup is Postgres instead of MariaDB. People need to start following the actual documented/well-supported installation options and stop trying to stick containers everywhere...

  • Unfortunate name collision with another project related to self-hosting: https://github.com/progmaticltd/homebox

    The website could use one or two screenshots of important features without having to login to the demo.

    Painting the window is noticeably slow (Firefox ESR 102), have you tested it with Firefox?

    Other than that, well done! It looks good and well maintained. I wanted to evaluate Snipe-IT someday, maybe I will also give this a try

  • This is the only real answer - it is not possible to do proper capacity planning without trying the same workload on similar hardware [1].

    Some projects give an estimation of resource usage depending on a number of factors (simultaneous/total users...) but most don't, and even the estimations may be far from actual usage during peak load, with many concurrent services, etc.

    The only real answer is close monitoring of resource usage and response times (possibly with alerting), and start adding resources or cutting down on resource-hungry features/programs if resource usage goes over a certain threshold (~80% is when you should start paying attention) and/or performance starts to degrade.

    My general advice is to max out installed RAM from the start, virtualize your hosts (which make it easier to add/remove resources or migrate a hungry VM on more powerful hardware later), and watch out for disk I/O on certain workloads (databases... having db engines running off SSDs helps greatly).

  • You can full well deploy docker stacks using ansible. This is what I used to do for rocket.chat: [1] [2] (ditched it for Matrix/element without Docker, but the concept stays valid)

    I’m not to the point where the specifics of every system is in Ansible yet.

    What I suggest is writing a playbook that list the roles attached to your servers, even if the roles actually do nothing:

     yaml
        
    # playbook.yml
    - hosts: myhomeserver.example.org
      roles:
        - debian-base
        - docker
        - application-x
        - service-y
    
    - hosts: mydevserver.example.org
        - debian-base
        - application-z
    
    
      
     yaml
        
    # roles/application-x/tasks/main.yml
    - name: setup application-x
      debug:
        msg: "TODO This will one day deploy application-x. For now the setup is entirely manual and documented in roles/application-x/README.md"
    
      
     yaml
        
    # roles/application-x/tasks/main.yml
    - name: setup service-y
      debug:
        msg: "TODO This will one day deploy service-y. For now the setup is entirely manual and documented in roles/service-y/README.md"
    
    #...
    
      

    This is a good start for a config management/automated deployment system. At least you will have an inventory of hosts and what's running on them. Work your way from there, over time progressively convert your manual install/configuration steps to automated procedures. There are a few steps that even I didn't automate (like configuring LDAP authentication for Nextcloud), but they are documented in the relevant role README [3]

  • dia diagram editor (desktop application). I like that .dia diagrams can be exported to PNG by command-line so this fits well in my automated setup (edit diagram in dia, run make doc, PNG diagram updated and embedded in my projects README).

    As others said, draw.io is also nice.

  • ansible, self-documenting. My playbook.yml has a list of roles attached to each host, each host's host_vars file has details on service configuration (domains, etc). It looks like this: https://pastebin.com/6b2Lb0Mg

    Additionally this role generates a markdown summary of the whole setup and inserts it into my infra's README.md.

    Manually generated diagrams, odd manual maintenance procedures and other semi-related stuff get their own sections in the README (you can check the template here) or linked markdown files. Ongoing problems/research goes into the infra gitea project's issues.

  • It depends what your interests are. I have 364 feeds in my reader (granted, a lot of these are from Reddit [1], I wouldn't be surprised if they remove this way to consume posts someday...), on various topics ranging from IT/tech stuff, news, DIY, history/art blogs, youtube channels [2], bandcamp album releases [3]...

    When I find an interesting article by just randomly browsing/following links, I often check if the website/blog has an RSS feed and just add it to my feed reader. If it's too noisy, I end up adding filters to discard (or auto-mark as read) bad articles based on title/content/tags.

    Random recommendation: https://solar.lowtechmagazine.com/ (https://solar.lowtechmagazine.com/about/the-solar-website)

    • distribution packages: unattended-upgrades + netdata-apt for upgrades that may slip through the cracks
    • github/gitlab/... projects: RSS feeds for releases /releases.atom or commits /commits.atom
    • the rest: nvchecker