Skip Navigation

User banner
Posts
3
Comments
89
Joined
2 yr. ago

  • If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I’d get my ass sued. I have to buy the books and the scientific papers.

    Well, if OpenAI knowingly used pirated work, that's one thing. It seems pretty unlikely and certainly hasn't been proven anywhere.

    Of course, they could have done so unknowingly. For example, if John C Pirate published the transcripts of every movie since 1980 on his website, and OpenAI merely crawled his website (in the same way Google does), it's hard to make the case that they're really at fault any more than Google would be.

  • There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.

    That's part of the allegation, but it's unsubstantiated. It isn't entirely coherent.

  • these companies who have been using copyrighted material - without compensating the content creators - to train their AIs.

    That wouldn't be copyright infringement.

    It isn't infringement to use a copyrighted work for whatever purpose you please. What's infringement is reproducing it.

  • Maybe you don't care, but the OSI definition does.

  • In fairness, they didn't release anything open at all.

  • You're getting lost in the weeds here and completely misunderstanding both copyright law and the technology used here.

    First of all, copyright law does not care about the algorithms used and how well they map what a human mind does. That's irrelevant. There's nothing in particular about copyright that applies only to humans but not to machines. Either a work is transformative or it isn't. Either it's derivative of it isn't.

    What AI is doing is incorporating individual works into a much, much larger corpus of writing style and idioms. If a LLM sees an idiom used a handful of times, it might start using it where the context fits. If a human sees an idiom used a handful of times, they might do the same. That's true regardless of algorithm and there's certainly nothing in copyright or common sense that separates one from another. If I read enough Hunter S Thompson, I might start writing like him. If you feed an LLM enough of the same, it might too.

    Where copyright comes into play is in whether the new work produced is derivative or transformative. If an entity writes and publishes a sequel to The Road, Cormac McCarthy's estate is owed some money. If an entity writes and publishes something vaguely (or even directly) inspired by McCarthy's writing, no money is owed. How that work came to be (algorithms or human flesh) is completely immaterial.

    So it's really, really hard to make the case that there's any direct copyright infringement here. Absorbing material and incorporating it into future works is what the act of reading is.

    The problem is that as a consumer, if I buy a book for $12, I'm fairly limited in how much use I can get out of it. I can only buy and read so many books in my lifetime, and I can only produce so much content. The same is not true for an LLM, so there is a case that Congress should charge them differently for using copyrighted works, but the idea that OpenAI should have to go to each author and negotiate each book would really just shut the whole project down. (And no, it wouldn't be directly negotiated with publishers, as authors often retain the rights to deny or approve licensure).

  • Okay, given that AI models need to look over hundreds of thousands if not millions of documents to get to a decent level of usefulness, how much should the author of each individual work get paid out?

    Congress has been here before. In the early days of radio, DJs were infringing on recording copyrights by playing music on the air. Congress knew it wasn't feasible to require every song be explicitly licensed for radio reproduction, so they created a compulsory license system where creators are required to license their songs for radio distribution. They do get paid for each play, but at a rate set by the government, not negotiated directly.

    Another issue, who decides which works are more valuable, or how? Is a Shel Silverstein book worth less than a Mark Twain novel because it contains less words? If I self publish a book, is it worth as much as Mark Twains? Sure his is more popular but maybe mine is longer and contains more content, whats my payout in this scenario?

    I'd say no one. Just like Taylor Swift gets the same payment as your garage band per play, a compulsory licensing model doesn't care who you are.

  • Isn’t learning the basic act of reading text?

    not even close. that’s not how AI training models work, either.

    Of course it is. It's not a 1:1 comparison, but the way generative AI works and the we incorporate styles and patterns are more similar than not. Besides, if a tensorflow script more closely emulated a human's learning process, would that matter for you? I doubt that very much.

    Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of >> their copyrighted works in training artificial intelligence tools

    Having to individually license each unit of work for a LLM would be as ridiculous as trying to run a university where you have to individually license each student reading each textbook. It would never work.

    What we're broadly talking about is generative work. That is, by absorbing one a body of work, the model incorporates it into an overall corpus of learned patterns. That's not materially different from how anyone learns to write. Even my use of the word "materially" in the last sentence is, surely, based on seeing it used in similar patterns of text.

    The difference is that a human's ability to absorb information is finite and bounded by the constraints of our experience. If I read 100 science fiction books, I can probably write a new science fiction book in a similar style. The difference is that I can only do that a handful of times in a lifetime. A LLM can do it almost infinitely and then have that ability reused by any number of other consumers.

    There's a case here that the renumeration process we have for original work doesn't fit well into the AI training models, and maybe Congress should remedy that, but on its face I don't think it's feasible to just shut it all down. Something of a compulsory license model, with the understanding that AI training is automatically fair use, seems more reasonable.

  • Isn’t learning the basic act of reading text? I’m not sure what the AI companies are doing is completely right but also, if your position is that only humans can learn and adapt text, that broadly rules out any AI ever.

  • Basically credit card theft.

    Over twenty years ago, when I was pretty young and inexperienced, I answered a newspaper ad for IT/programming at a so-called "startup." It sounded great.

    My first day was in someone's living room-turned office and I didn't actually have any real idea what the business was. I was told it was a financial company, but it was taking off like gangbusters. Relatively quickly, within days actually, we moved into a very nice class-A office building. The owner was a remarkably charismatic man and being in his presence made you feel warm and understood and like you had a world of possibilities around you. I felt like a badass: I had a good-paying job, worked in a beautiful and prestigious office, and had a boss who made me feel great.

    I found out, however, he was basically just running a scam. Between about 2-4am, he would have TV spots running, selling naive housewives, unemployment breadwinners, alcoholics, etc a "system" to earn huge sums of money very quickly. His system? You find people selling notes. You find people who want to buy notes. You introduce them and take a commission. A huuuuuuge commission.

    Was that illegal? I don't know. I kind of doubt the people in the ads were real, but my paychecks were clearing.

    I learned that when his sales people (who worked late at night, when the infomercials ran) took orders, they would record everyone's credit card info. Then, the owner directed us to automatically sign them up for things they didn't ask for -- recurring subscriptions to his membership-based "note marketplace" website. This was before the Internet was so mainstream, and many people buying this package didn't even have a computer.

    If people tried to place an order, and one credit card was declined, he'd just have them quietly try another card we had on file for them, without asking. If anyone complained, they'd obviously just refund the whole charge to avoid pissing off the credit card companies, but he was really just hoping no one would notice.

    I quit pretty quickly and got a "real" real job.

  • That still does touch on the problem. The RCS group was formed in 2007. Let that sink in.

  • I think if you try to have regulators come up with standards for things like airdrop or location sharing, it's going to be a bad time.

    RCS you can just regulate as a telecom feature. It's contained. It doesn't touch things like finance (which vary by country).

  • I honestly get it. Apple has been excruciatingly stubborn to adopt RCS.

    I think in the past this was excusable because RCS has been such a moving target. First it was the carriers disagreeing about how to implement, and dragging their feet, then Google got tired of waiting for carriers and sort of bypassed them. But even then RCS is messy when it's part carrier, part Google, etc. Even Google Fi doesn't support RCS if you want its text-from-computer function working! Then came e2e encryption, which has been haphazard.

    At this point though, it is starting to solidify. Apple should implement it, and if Apple drags their feet, regulators should intervene. Don't rule out that happening in the EU, either.

  • Well, no, because that sentence doesn't make any sense. There's no such thing as a "PPP adjusted GDP," PPP is just a way of measuring GDP. I'm suggesting that if you want to use PPP to measure GDP, by all means, use PPP. PPP merely corrects for currency imbalances.

    In other words, if you don't like nominal GDP (valid), by all means, use PPP. Both PPP and nominal GDP are measures of GDP though.

    SO: China spends between 1.7% of its total economic output directly on its military. The US spends closer to 3.5%.

    If the US spent what China does, as a percentage of GDP, that would be just shy of $400bn. A lot of money, for sure, but we're closing on a $2.0 trillion budget deficit.

  • I think you need to learn the concept of purchasing power.

    By all means, use PPP.

    My point was that GDP is not a useful metric, and I even gave you a concrete example of why.

    Use PPP if you prefer.

    You're the one who compared US spending to China's in absolute dollars.

  • The reality is that US spends more on military than the next 10 countries combined.

    Unless we're going to start paying soldiers $1,000/yr (roughly what China does), that's going to be the reality.

    For context, overall manufacturing output of US is only around $1.9 trillion.

    And New York City's manufacturing output is almost nothing. Manufacturing isn't GDP.

  • It’s problematic. Plenty of instances host illegal content, and you don’t want that liability as an instance admin. Other instances do things that abuse the network, like post unmitigated spam.

  • One distortion of excessive national debt can be inflation. Good thing that’s not a conce—oh wait.