As a civil matter, the publishing houses are more likely to get the full money if anthropic stays in business (and does well). So it might be bad, but I'm really skeptical about bankruptcy (and I'm not hearing anyone seriously floating it?)
Plantifs made that argument and the judge shoots it down pretty hard. That competition isn't what copyright protects from. He makes an analogy with teachers teaching children to write fiction: they are using existing fantasy to create MANY more competitors on the fiction market. Could an author use copyright to challenge that use?
Would love to hear your thoughts on the ruling itself (it's linked by reuters).
Depends on the content and the method. There are tons of ways to encrypt data, and under relevant law they may still count as copies. There are certainly weaker NN models where we can extract a lot of the training data, even if it's not easy, from the model parameters (even if we can't find a prompt that gets the model to regurgitate).
I'm still looking for a good reason to believe critical thinking and intelligence are taking a dive. It's so very easy to claim the kids aren't all right. But I wish someone would check. An interview with the gpt cheaters? A survey checking that those brilliant essays aren't from people using better prompts? Let's hear from the kids! Everyone knows nobody asked us when we were being turned into ungrammatical zombies by spell check/grammar check/texting/video content/ipads/the calculator.
I felt this way until recently, when I'm becoming much more aware of how limited our collective attention is. Every honest belief probably deserves to have one (maybe 3) reasonable people listen to it. But they definitely aren't all worth national/state/city/expert attention.
If you wanna go the extra mile, skimming an ally guide for 10 minutes, looking up some terminology and concepts, would reduce awkwardness by a fair bit. I certainly would have avoided a half dozen missteps if I did some reading.
As I understand it, there are many many such models. Especially those made for academic use. Some common training corpus's are listed here: https://www.tensorflow.org/datasets
Examples include wikipedia edits and discussions, and open source scientific articles.
Almost all research models are going to be trained on stuff like this. Many of them have demos, open code, and local installation instructions. They generally don't have a marketing budget. Some of the models listed here certainly qualify: https://github.com/eugeneyan/open-llms?tab=readme-ov-file
Both of these are lists that are not so difficult to get on; so I imagine some of these have trouble with falsification or mislabeling, as you point out. But there's little reason for people to do so (beyond improving a papers results I guess?).
Art generation seems to have had a harder time, but there are stable diffusion equivalents that used only CC work. A few minutes of search found: Common Canvas, claims to have been competitive.
I think it's fine for this to be poorly defined; what I want is something aligned with reality beyond op-eds. Qualitative evidence isn't bad; but I think it needs to be aggregated instead of anecdoted. Humans are real bad at judging how the kids are doing (complaints like the OP are older than liberal education, no?); I don't want to continue the pattern. A bunch of old people worrying too much about students not reading shakespear in classes is how we got the cancel culture moral panic - I'd rather learn from that mistake.
A handful of thoughts:
There are longitudinal studies that interview kids at intervals; are any of these getting real weird swings?
Some kids have AI earlier; are they much different from similar peers without?
Where's the broad interviews/story collection from the kids? Are they worried? How would they describe their use and their peers use of AI?
As far as I can tell, the strongest data is wrt literacy and numeracy, and both of those are dropping linearly with previous downward trends from before AI, am I wrong? We're also still seeing kids from lockdown, which seems like a much more obvious 'oh that's a problem' than the AI stuff.
Honest question: how do we measure critical thinking and creativity in students?
If we're going to claim that education is being destroyed (and show we're better than our great^n grandparents complaining about the printing press), I think we should try to have actual data instead of these think-pieces and anecdata from teachers. Every other technology that the kids were using had think-pieces and anecdata.
Not a number theorist, but the wikipedia reads ok for me, so I'll give an attempt. Answer based on the AMS's Translated Math Monographs 240, by Kazuya Kato et. al..
A sample of the questions class field theory wants to address:
a) Which primes p are the sum of 2 squares, p=a{2} + b{2}?
b) What about other formulae, say eg p=a{2} +2b{2}?
c) Consider a Galois extension. Take a prime ideal P in the smaller ring. For which primes does this ideal factor when we look at the larger ring?
d) When is the factorization square free (unramified)?
e) What's the smallest cyclotomic extension that contains sqrt(M) for a given M?
If we look at the integers, you may already know the answers to several of these! And they all have something kinda magic in common. For (a), for example, the primes that are the sum of 2 squares are exactly those with p = 1 mod 4. For example, 5=22 + 12, yet 7 cannot be written as a sum of two squares. The answer to question (b) is similar! We can do it exactly when p=1,3 mod 8.
For ( c ), for concreteness let's take the extension of the rationals Q to the rationals with a square root of -3, Q(sqrt(-3)). The prime ideal (7) factors as (7, 1-sqrt(-3)) (7, 1+sqrt(-3)) (a product of two distinct prime ideals; unramified), as do the ideals (13), (19), (31), and (37). But (5), (11), (17), (23) and (29) all don't. Perhaps you notice a pattern: p=1 mod 3 ? factors. p=2 mod 3? doesn't. There's also a unique ramified prime, (3) = (sqrt(-3))^2. There will generally only be a finite number of ramified primes. Do a dozen more examples and you'll notice a spooky pattern: the ramified primes seem to show up in the modulus (in this example, 3 was ramified and the factorization pattern works mod 3. If 7 and 23 are ramified, the factorization cases will work modulo 7*23=161). [Quadratic extensions are not special btw; the factorization of (p) in Q(zeta_5) (Q with a 5th root of 1) depends on p mod 5.]
On the face of it, why would modular arithmetic be the relevant condition? And why does the modulus seem to care about ramification?
A major result of Galois theory is that there's a correspondence between subgroups of (Z/NZ)* (integers modulo N under multiplication) and intermediate field extensions between Q and a cyclotomic extension Q(zeta_N). Prime ideal ramification and factoring can be stated in terms of this correspondence. Further, they show that every finite abelian extension of Q lives inside some Q(zeta_N). This result lets us explain all of (a)-(e). Generalizing it is one of the big motivations of class field theory. If we start not with Q, but with say Q(sqrt(-3)), what still holds? What is the right generalization of cyclotomic extensions and (Z/NZ)*?
My understanding is that this program is quite successful. There's a replacement for both that's only somewhat more technical/tedious, and that gives similar results. One of the bigger successes is generalizing 'reciprocity' laws (the quadratic case is often taught in undergrad number theory; it's about the surprising fact that p is a square mod q depends on if q is a square mod p).
As one of the few folks who have asked such questions, I obviously am against. I don't think the dedicated pol communities are particularly good for honest questions about platforms/political figures; everything in those spaces feels like it's being intentionally spun (even in discussions) in a way that this community does not. (Also, several of the communities you suggest as pol discussion places are... just not? Extremely few questions, most the posts are headlines, discussions don't seem to happen much. Some feel closer to a curated feed of cringe.)
I do agree it could become an issue, and that would justify some division, perhaps tags? But I don't think it is currently very unpleasant, and it will almost certainly get better in 2 months (at least short term).
I think the scary thing is if it takes the suppliers more than 3 days to figure that out. Companies oftentimes can last 3 days without food (and rarely fix things very quickly at any scale).
That one seems kinda scary - if inflation was 6% and something wasn't sold at any profit, all stores would stop selling it. (This is true for most food.)
I would love to see the source on this one. It sounds fascinating.