An Algorithm Told Police She Was Safe. Then Her Husband Killed Her.
madsen @ madsen @lemmy.world Posts 1Comments 94Joined 2 yr. ago
so it’s probably just some points assigned for the answers and maybe some simple arithmetic.Why yes, that’s all that machine learning is, a bunch of statistics :)
I know, but that's not what I meant. I mean literally something as simple and mundane as assigning points per answer and evaluating the final score:
// Pseudo code risk = 0 if (Q1 == true) { risk += 20 } if (Q2 == true) { risk += 10 } // etc... // Maybe throw in a bit of if (Q28 == true) { if (Q22 == true and Q23 == true) { risk *= 1.5 } else { risk += 10 } } // And finally, evaluate the risk: if (risk < 10) { return "negligible" } else if (risk >= 10 and risk < 40) { return "low risk" } // etc... You get the picture.
And yes, I know I can just write if (Q1) {
, but I wanted to make it a bit more accessible for non-programmers.
The article gives absolutely no reason for us to assume it's anything more than that, and I apparently missed the part of the article that mentioned that the system had been in use since 2007. I know we had machine learning too back then, but looking at the project description here: https://eucpn.org/sites/default/files/document/files/Buena%20practica%20VIOGEN_0.pdf it looks more like they looked at a bunch of cases (2159) and came up with the 35 questions and a scoring system not unlike what I just described above.
Edit: I managed to find this, which has apparently been taken down since (but thanks to archive.org it's still available): https://web.archive.org/web/20240227072357/https://eticasfoundation.org/gender/the-external-audit-of-the-viogen-system/
VioGén’s algorithm uses classical statistical models to perform a risk evaluation based on the weighted sum of all the responses according to pre-set weights for each variable. It is designed as a recommendation system but, even though the police officers are able to increase the automatically assigned risk score, they maintain it in 95% of the cases.
... which incidentally matches what the article says (that police maintain the VioGen risk score in 95% of the cases).
The crucial point is: 8% of the decisions turn out to be wrong or misjudged.
The article says:
Yet roughly 8 percent of women who the algorithm found to be at negligible risk and 14 percent at low risk have reported being harmed again, according to Spain’s Interior Ministry, which oversees the system.
Granted, neither "negligible" or "low risk" means "no risk", but I think 8% and 14% are far too high numbers for those categories.
Furthermore, there's this crucial bit:
At least 247 women have also been killed by their current or former partner since 2007 after being assessed by VioGén, according to government figures. While that is a tiny fraction of gender violence cases, it points to the algorithm’s flaws. The New York Times found that in a judicial review of 98 of those homicides, 55 of the slain women were scored by VioGén as negligible or low risk for repeat abuse.
So in the 98 murders they reviewed, the algorithm put more than 50% of them at negligible or low risk for repeat abuse. That's a fucking coin flip!
I don't think there's any AI involved. The article mentions nothing of the sort, it's at least 8 17 years old (according to the article) and the input is 35 yes/no questions, so it's probably just some points assigned for the answers and maybe some simple arithmetic.
Edit: Upon a closer read I discovered the algorithm was much older than I first thought.
The article mentions that one woman (Stefany González Escarraman) went for a restraining order the day after the system deemed her at "low risk" and the judge denied it referring to the VioGen score.
One was Stefany González Escarraman, a 26-year-old living near Seville. In 2016, she went to the police after her husband punched her in the face and choked her. He threw objects at her, including a kitchen ladle that hit their 3-year-old child. After police interviewed Ms. Escarraman for about five hours, VioGén determined she had a negligible risk of being abused again.
The next day, Ms. Escarraman, who had a swollen black eye, went to court for a restraining order against her husband. Judges can serve as a check on the VioGén system, with the ability to intervene in cases and provide protective measures. In Ms. Escarraman’s case, the judge denied a restraining order, citing VioGén’s risk score and her husband’s lack of criminal history.
About a month later, Ms. Escarraman was stabbed by her husband multiple times in the heart in front of their children.
It also says:
Spanish police are trained to overrule VioGén’s recommendations depending on the evidence, but accept the risk scores about 95 percent of the time, officials said. Judges can also use the results when considering requests for restraining orders and other protective measures.
You could argue that the problem isn't so much the algorithm itself as it is the level of reliance upon it. The algorithm isn't unproblematic though. The fact that it just spits out a simple score: "negligible", "low", "medium", "high", "extreme" is, IMO, an indicator that someone's trying to conflate far too many factors into a single dimension. I have a really hard time believing that anyone knowledgeable in criminal psychology and/or domestic abuse would agree that 35 yes or no questions would be anywhere near sufficient to evaluate the risk of repeated abuse. (I know nothing about domestic abuse or criminal psychology, so I could be completely wrong.)
Apart from that, I also find this highly problematic:
[The] victims interviewed by The Times rarely knew about the role the algorithm played in their cases. The government also has not released comprehensive data about the system’s effectiveness and has refused to make the algorithm available for outside audit.
Yup. I remember it from when Atlanta hosted the Olympic games some time in the '90s. Despicable.
Didn't something similar happen in Turkey with Erdogan a few years back? Pretty sure he was accused of being behind it himself too; don't know what the final verdict was though.
I think it's a pretty common accusation, just like when a politician is attacked, someone will invariably suggest that they staged it in order to get more support.
I read every single word of it, twice, and I was laughing all the way through. I'm sorry you don't like it, but it seems strange that you immediately assume that I haven't read it just because I don't agree with you.
This is such a fun and insightful piece. Unfortunately, the people who really need to read it never will.
I get notifications for calls (obviously), SMS messages (of which I receive an average of 1 per month) and IMs from my immediate family. Everything else I check up on when I actually feel like I have the time for it. This has dramatically reduced the number of emails and other things I forget to reply to/act on, because I see them when I want to and when I have the time to actually deal with them; not when some random notification pops up when I'm doing something else, gets half-noticed and swiped away because I'll deal with it later.
Cloud Saves may be difficult to deal with, depending on what games you play.
The headline is supposedly CISA urging users to either update or delete Chrome — it's not Chrome/Google itself. However, I'm having trouble finding the actual CISA alert. It's not linked in the article as far as I can tell.
Permanently Deleted
Fair enough, and thanks for the offer. I found a demo on YouTube. It does indeed look a lot more reasonable than having an LLM actually write the code.
I'm one of the people that don't use IntelliSense, so it's probably not for me, but I can definitely see why people find that particular implementation useful. Thanks for catching and correcting my misunderstanding. :)
Permanently Deleted
I'm closing in on 30 years too, started just around '95, and I have yet to see an LLM spit out anything useful that I would actually feel comfortable committing to a project. Usually you end up having to spend as much time—if not more—double-checking and correcting the LLM's output as you would writing the code yourself. (Full disclosure: I haven't tried Copilot, so it's possible that it's different from Bard/Gemini, ChatGPT and what-have-you, but I'd be surprised if it was that different.)
Here's a good example of how an LLM doesn't really understand code in context and thus finds a "bug" that's literally mitigated in the line before the one where it spots the potential bug: https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-for-intelligence/ (see "Exhibit B", which links to: https://hackerone.com/reports/2298307, which is the actual HackerOne report).
LLMs don't understand code. It's literally your "helpful", non-programmer friend—on stereoids—cobbling together bits and pieces from searches on SO, Reddit, DevShed, etc. and hoping the answer will make you impressed with him. Reading the study from TFA (https://dl.acm.org/doi/pdf/10.1145/3613904.3642596, §§5.1-5.2 in particular) only cements this position further for me.
And that's not even touching upon the other issues (like copyright, licensing, etc.) with LLM-generated code that led to NetBSD simply forbidding it in their commit guidelines: https://mastodon.sdf.org/@netbsd/112446618914747900
Edit: Spelling
Permanently Deleted
I wouldn't trust an LLM to produce any kind of programming answer. If you're skilled enough to know it's wrong, then you should do it yourself, if you're not, then you shouldn't be using it.
I've seen plenty of examples of specific, clear, simple prompts that an LLM absolutely butchered by using libraries, functions, classes, and APIs that don't exist. Likewise with code analysis where it invented bugs that literally did not exist in the actual code.
LLMs don't have a holistic understanding of anything—they're your non-programming, but over-confident, friend that's trying to convey the results of a Google search on low-level memory management in C++.
Nowhere does he say that he doesn't believe in Wunterslash, so I'm cool with him.
I don't think anyone should fear for their lives because of their opinions regardless of how stupid they are.
Edit: It's pretty fucked up that this is somehow controversial...
Couldn't you just program it to start (and stop) at a given time, or make a note of how long it says on the display that it'll take?
It seems (to me) like a very, very minor improvement for a huge cost, namely that your washing machine is on your network and is internet connected.
It's not Mozilla's CEO that's doing anything shady here, it's a partner company, OneRep.
Edit: And Mozilla is breaking up with OneRep because of it. (Just in case someone had missed that part.)
Using a password manager I’d have to copy-paste or remember each password. Not all have a web interface.
Then pick one that has a web interface or a CLI, Bitwarden has both and is free. KeePass databases can be hosted on your NAS and accessed to CLI tools. There are plenty of options. Or use passphrases (which are just as good as—or better than—complex passwords) and just type them? I use Bitwarden for literally each and every password/lock code/PIN that I have, and I have plenty of Pis and other things that don't let me easily log into Bitwarden, but finding "Excentric4-Waxing-Adopted-Giraffe" on one device, and typing it in another really isn't much of a hassle. (Also, why not just SSH into your Pis? Then you only need to worry about accessing a password manager on the machine you're opening the SSH connection from.)
From the comments on this post it seems that you're mostly looking for validation of the idea you originally had rather than actual feedback on how secure that idea is. You're obviously free to manage your passwords exactly as you want, but this idea of a "base password" is objectively less secure than the alternative put forward by many people in these comments, namely to use the Yubikey to log into a good password manager that then handles all the different (completely unique) passwords.
There are always instances where doing things the best and most secure way is more cumbersome, and it's up to you to decide if you want all of your passwords to be poor (and difficult to change, in this case) just because you occasionally need to log into something that doesn't neatly integrate with a password manager.
Your point is valid regardless but the article mentions nothing about AI. ("Algorithm" doesn't mean "AI".)