Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)HN
Posts
3
Comments
469
Joined
2 yr. ago

  • I don't know why you feel it's necessary to write that several times. Yes, Matrix doesn't enforce e2ee, that's why I said you'd need to make sure to enable it. But what is your point? You can use Matrix how you like, with or without, make it secure or not. Sure that has implications. But Lemmy is also not end-to-end-encrypted. Neither is a private conversation in my livingroom... Other messengers have features to backup or restore keys, or to flag and moderate posts, which may in some cases circumvent e2ee. Or not. So what's your jist? Would you like us to use a different messenger? I'd be happy to hear constructive arguments. I also had issues with niche clients not supporting encryption or not supporting emoji verify since they aren't forced to implement it.

  • Not in the technical sense. I mean you can choose if your data packets traverse the Atlantic, but I don't think that's noticeable in a messenger. It changes the legislation, though. Since you mentioned security and privacy as goals... There are laws server admins have to abide to. And these laws vary greatly amongst countries. For example the whole EU has stricter privacy laws than most of the US. And there are things like lawful interception, the intelligence apparatus and general surveillance. I'd keep that in my mind when looking for the most secure server.

  • Oh, that's quite some requrements. I honestly don't know, I operate my own small instance for me and my family. I just wanted to add that matrix is supposed to be end-to-end encryped. So if you activate that in your chats and rooms, the server operator can't read your messages anyways. But they can collect metadata. For example they know when and who you're texting and they know where you're connected from. But they can't see the actual messages.

  • What they mean by that is probably the fact that you can download the model, run it on your own hardware and adapt it. Contrary to what OpenAI does, who just offer a service and don't give access to the model itself, you can just use ChatGPT through their servers.

    Most of the models come with a Github repo with code to run it and benchmarks. But it's more or less just boilerplate code to get it running in one of the well-established machine learning frameworks. Maybe a few customizations and the exact setup to get a new model architecture running. It would usually be something like Huggingface's Transformers library. There are a few other big projects which are used by people. If researchers come up with new maths, concepts and new architectures, it eventually gets implemented there.

    But the code that gets released alongside new models it usually meant for scientific repeatability and not necessarily for actual use. It might contain customizations that make it difficult to incorporate it into other things, usually isn't maintained after the release and most of the times it is based on old versions of libraries, that were state of the art when they started with their research. So that's usually not what gets used by people in the end.

    Interestingly enough companies all use different phrasing. Mistral AI claims to be commited to be "open & transparent" yet they like to drop torrent files to new models that come with zero explanation and code. And OpenAI still carries the word "open" in their company name, but at this point openness is more a hint of an idea from their very early days.

    Anyways, inference code and the model aren't the same thing. It would be more like if we were talking about cake recipes and you provide me with the schematics of a kitchen aid.

  • The training data for OpenLlama is called RedPajama if I'm not mistaken. And a reproduction of what Meta used to train the first LLaMA. Back then they listed the datasets in the scientific paper. Nowadays they and their competitors don't do that anymore.

    OpenLlama performs about as good (slightly worse) as the first official LLaMA. And both perform worse than Llama2. It's not day and night, but i think a noticeable improvement. And Llama2 has twice the context length which is a huge improvement for some use-cases.

    If you're looking for models with a different license, there are some more. Mistral is Apache 2.0 and there are several more with permissive licenses.

    If you're looking for info on what datasets the big players use, forget it (my opinion). The companies are all involved in legal battles over copyright and have stopped publishing what they use. Many (except for Meta) have kept it a (trade) secret from the beginning and never shared such information. It's unscientific because it doesn't allow for repeatability. But AI is expensive and everyone is currently trying to get obscenely rich with it or strives for world domination.

    But datasets are available, like the RedPajama one, several other collections for various purposes... Lots of datasets for fine-tuning and a whole community around that. Just for the base/foundation models, we don't have access to a current state of the art dataset for that.

  • And they don't provide the source... So it's neither open nor source. I get why and how Meta tries to make themselves look better. And I'm grateful for having access to such models. But I think words have meanings and journalists should do better than repeat that phrasing and help watering down the meaning of 'open source'. (Which technically doesn't mean free or without restrictions, but is often used synonymously.)

  • Ah, well I just learned about the existence of free vpn services. I'm going to use it to set up a free guest wifi, so the neighbors, guests (and I) can do whatever with it. But I also struggle with the setup. It's complicated to get the wireguard interface set up, the guest wifi isolated and set up the split routing and everything so the different wifis on the router forward the traffic over different services.

  • Fair enough. I got confused by their FAQ. They say Wireguard is supported on their free plan. But there is no config available with the keys, so you have to use their client to connect.

    I recently registered an account and wanted to do something similar. Guess it isn't that easy then. Another possibility is to use protonvpn.com they also offer a free tier and you can connect any Wireguard client with that.

    Or you switch protocols and use for example IKEv2 with strongswan or OpenVPN or whatever hide.me offers in addition to wireguard. I think gluetun also does OpenVPN. But hide.me isn't listed for some reason.

  • Thank you for pointing out that my arguments don't necessarily apply to reality. Sometimes I answer questions too direct. And the question wasn't "should I use a firewall" or I would have answered with "probably yes."

    I think I have to make a few slight corrections: I think we use the word "timing attack" differently. To me a timing attack is something that relies on the exact order or interval/distance packets arrive at. I was thinking of something like TOR does where it shuffles around packets, waits for a few milliseconds, merges them or maybe blows them up so they all have the same size. Brute forcing something isn't exploiting the exact time where a certain packet arrives, it's just sending many of them and the other side lets the attacker try an indefinite amount of passwords. But I wouldn't put that in the same category with timing attacks.

    Firewall vs MySQL: I don't think that is a valid comparison. The firewall doesn't necessarily look into the packets and detect that someone is running a SQL injection. Both do a very different job. And if the firewall doesn't do deep-packet-inspection or rate limiting or something, it just forwards the attack to the service and it passes through anyways. And MySQL probably isn't a good example since it rarely should be exposed to the internet in the first place. I've configured MariaDB just to listen on the internal interface and not to packets from other computers. Additionally I didn't open the port in the firewall but MariaDB doesn't listen on that interface anyways. Maybe a better comparison would be a webserver with https. The firewall can't look into the packets because it's encrypted traffic. It can't tell apart an attack from a legitimate request and just forwards them to the webserver. Now it's the same with or without a firewall. Or you terminate the encrypted traffic at the firewall, do packet inspection or complicated heuristics. But that shifts the complexity (including potential security vulberabilities in complex code) from the webserver to the firewall. And it's a niche setup that also isn't well tested. And you need to predict the attacks. If your software has known vulnerabilities that won't get fixed, this is a valid approach. But you can't know future attacks.

    Having a return channel from the webserver/software to the firewall so the application can report an attack and order the firewall to block the traffic is a good thing. That's what fail2ban is for. I think it should be included by default wherever possible.

    I think there is no way around using well-written software if you expose it to the internet (like a webserver or a service that is used by other people.) If it doesn't need to be exposed to the internet, don't do it. Any means of assuring that are alright. For crappy software that is exposed and needs to be exposed, a firewall doesn't do much. The correct tools for that are virtualization, containers, VPNs, and replacing that software... Maybe also the firewall if it can tell apart good and bad actors by some means. But most of the time that's impossible for the firewall to tell.

    I agree. You absolutely need to do something about security if you run services on the internet. I do and have ran a few services. And especially webserver-logs (especially if you have a wordpress install or some other commonly attacked CMS), SSH and Voice-over-IP servers get bombarded with automated attacks. Same for Remote-Desktop, Windows-Networkshares and IoT devices. If I disable fail2ban, the attackers ramp up the traffic and I can see attacks scroll through the logfiles all day.

    I think a good approach is:

    1. Choose safe passwords and keys.
    2. Don't allow people to brute-force your login credentials.
    3. If you don't need a service, deactivate it entirely and remove the software.
    4. If you just need a service internally, don't expose it to the internet. A firewall will help, and most software I use can be configured to either listen on external requests or don't do it. Also configure your software to just listen on/to localhost (127.0.0.1). Or just the LAN that contains the other things that tie into it. Doing it at two distinct layers helps if you make mistakes or something happens by accident or complexity or security vulnerabilities arise. (Or you're not in complete control of everything and every possibility.)
    5. If only some people need a service, either make it as secure as a public service or hide it behind a VPN.
    6. Perimeter security isn't the answer to everything. The subject is complex and we have to look at the context. Generally it adds, though.
    7. If you run a public service, do it right. Follow state of the art security practices. It's always complicated and depends on your setup and your attackers. There are entire books written about it, people dedicate their whole career to it. For every specific piece of software and combination, there are best practices and specific methods to follow and implement. Lots of things aren't obvious.
    8. Do updates and backups.
  • Sure, I didn't list what I meant by 'valid use-cases'. If it's just your private VPN or SSH endpoint, it's like blocking your bank card from being used abroad. It might backfire once you travel and forgot about it. But I think it's a valid use case. Ultimately it's not the countries you want to block but address ranges which get used by attackers. But security is complex, it may not be feasible to allow-list just the carriers you use to connect, or find a suitable blocklist.

    I'd be happy if georestriction wasn't a thing and I could stream Doctor Who from the BBC and some news sites wouldn't refuse service to me because I live in the EU and they don't want to implement the GDPR.

    But I agree, this is just a tool. And it can be used for good things and bad things.

    I don't complain if the same tool is used to route my requests to a datacenter nearby.

  • Yeah, back when the war with Russia and the Ukraine started I've seen people post tutorials about how to block people in Russia from accessing their blog and self-hosted services. So just for political reasons. I don't think this makes the world a better place.

    Same with countries where lots of attacks originate from. I think a better approach would be to block offending address ranges if possible, not directly block countries and all the people who live there.

    I don't think something needs to directly promote bad behaviour. Sometimes just making it easy, is enough to warrant a disclaimer to think before applying it.

  • Sure, maybe I've worded things too factually and not differentiated between theory and practice. But,

    1. "you know everything": I've said that. Configurations might change or you you don't pay enough attention: A firewall adds an extra layer of security. In practice people make mistakes and things are complex. In theory where everything is perfect, blocking an already closed port doesn't add anything.
    2. "There are no bugs in the network stack": Same applies to the firewall. It also has a network stack and an operating system and it's connected to your private network. Depends on how crappy network stacks you're running and how the network stack of the firewall compares against that. Might even be the same as on my VPS where Linux runs a firewall and the services. So this isn't an argument alone, it depends.
    3. Who migitates for timing attacks? I don't think this is included in the default setup of any of the commonly used firewalls.
    4. "open ports you are not even aware of": You open ports then. And your software isn't doing what you think it does. We agree that this is a use-case for a firewall. that is what I was trying to convey with the previous argument no 5.

    Regarding the summary: I don't think I want to advise people not to use a firewall. I thought this was a theoretical discussion about single arguments. And it's complicated and confusing anyways. Which firewall do you run? The default Windows firewall is a completely different thing and setup than nftables and a Linux server that closes everything and only opens ports you specifically allow. Next question: How do you configure it? And where do you even run it? On a seperate host? Do you always rent 2 VPS? Do you do only do perimeter security for your LAN network and run a single firewall? Do you additionally run firewalls on all the connected computers in the network? Does that replace the firewall in front of them? What other means of security protection did you implement? As we said a firewall won't necessarily protect against weak passwords and keys. And it might not be connected to the software that gets brute-forced and thus just forward the attack. In practice it's really complicated and it always depends on the exact context. It is good practice to not allow everything by default, but take the approach to block everything and explicitly configure exceptions like a firewall does. It's not the firewall but this concept behind it that helps.

  • You're right. If you don't open up ports on the machines, you don't need a firewall to drop the packages to ports that are closed and will drop the packets anyways. So you just need it if your software opens ports that shouldn't be available to the internet. Or you don't trust the software to handle things correctly. Or things might change and you or your users install additional software and forget about the consequences.

    However, a firewall does other things. For example forwarding traffic. Or in conjunction with fail2ban: blocking people who try to guess ssh passwords and connect to your server multiple times a second.

    Edit:

    1. “It’s just good security practice.” => nearly every time I've heard that people followed up with silly recommendations or were selling snake-oil.
    2. “You [just] need it if you are running a server.” => I'd say it's more like the opposite. A server is much more of a controlled environment than lets say a home network with random devices and people installing random stuff.
    3. “You need it if you don’t trust the other devices on the network.” => True, I could for example switch on and off your smarthome lights or disable the alarm and burgle your home. Or print 500 pages.
    4. “You need it if you are not behind a NAT.” => Common fallacy, If A then B doesn't mean If B then A. Truth is, if you have a NAT, it does some of the jobs a firewall does. (Dropping incoming traffic.)
    5. “You need it if you don’t trust the software running on your computer.” => True
  • Mmh. We should use georestriction with caution. Ultimatly the internet was made to connect people. And blocking people based on their origin is an attack on freedom and equality. There are valid use-cases, though. Just don't take it lightly.

  • I'm not sure if I agree with most of the premises. Security and privacy require extensive concepts and include several measures. It's difficult to single out one detail and make absolute statements about it without taking into consideration the context and rest of the setup. Also both depend on the exact threat scenario and it's difficult to say anything on the matter without defining the threat scenario first.

    An old-fashioned adblocker has some advantages over the newer variants and DNS blocking. It can rewrite the websites and remove most trackers, ads and annoyances even if they're on the same host. A DNS blocker / VPN can only do that if the tracker runs on a dedicated, distinct domain. And many services nowadays don't do that. You lose those blocking abilities.

    Sure an adblocker is software and thus has vulnerabilities and issues. But why make the cut here? Why not trust uBlock which is open-source, well used by millions of people and has more than one pair of eyes looking at it and a good track record... But trust the browser which is a ridiculously complex piece of software with millions and millions of lines of code and runs with even more permissions? Ontop an even more complex operating system that has access to everything and is often designed by companies that make a living by collecting user data?

    And I don't think a VPN is good per se. It also adds more complexity and a whole new company in the mix that now handles your traffic. Could be better than your ISP, could also be worse. Sure, it obscures your IP. But I'm sure most VPN providers have to abide by the law and do lawful intercept. As do internet service providers. So depending on the threat, there might not be any benefit over not using a VPN. And there are a lot of VPN offerings and different flavors. Not all of them are good. You could jeopardize your personal information by choosing the wrong one. It adds a layer of privacy under the condition that the company doesn't keep logs, doesn't collect user data and has their customer database and payment details decoupled from the network infrastructure.

    And the privacy of VPN use depends on other measures. If you use social media, login to a google account, ckeck your mail, don't filter trackers or use an Android or Apple phone that uses their services for push notifications, connectivity checks and all sorts of services... Your VPN IP will be known to said companies. And/or your username or other identifiers. They can correlate data, analyze your behaviour pretty much the same way as if it were an ordinary internet connection. It doesn't help against browser fingerprinting, cookies etc. And the metadata that is for example collected by instant messengers or other "free" services also is the same and also still tied to your account.

    I really don't see much of a benefit in using a VPN considering today's technology and the way online services and data collection works. Also their DNS filterlists are also still "badness enumeration" and the same concept as the adblocker filterlist.

    And I always like to tell people security and privacy aren't the same. Sometimes things even oppose each other. For example you could be using a secure Linux distribution and a privacy protecting browser. Now, without additional measures, you're easily recognized everywhere because only a fraction of the internet users use a setup like that. Combine that with a VPN and a nonstandard DNS that is provided by your VPN provider (and not 8.8.8.8 like most people type in) and you're singled out even more. (And using Google's DNS sends your requests to Google, so that's also not good.) There are additional techniques to migitate for things. In this example faking the browser agent. But there are other techniques to invade privacy, migitations and it's really a complex subject, that doesn't have a simple answer to it.

    So if the statement is: uBlock doesn't provide absolute privacy nor security, I agree. The remaining statements are too simplistic and probably don't hold true in real-world scenarios.