@julkaI'll take all of your sarcastic replies and summarize them as 'I have no valid argument here, so I'll try to make fun of the messenger.' Because that's what you actually did - just made a sarcastic reply while ignoring the substance of the argument.
In debate, if you simply ignore the substance of an argument and make a personal attack in response, guess what? The person who advanced the argument is considered to have won that point. So, I'll take my win on all of those points and move on, mostly.
I'll engage slightly on libraries, though. Yes, libraries existed for many years without photocopiers. So, fine. Libraries without photocopiers may or may not constitute black boxes (Do they allow camera phones? Handwriting? If so, still a black box under your definition). My argument is that 'libraries with photocopiers' constitute a black box by your definition of a black box. Which is, in fact, pretty much what I said the first time. You keep trying to ignore that, but you can't actually give any substantive reason why they don't, only that 'they have a license!' (which, in fact, they do not for paper books - and, if that counted, any LLM trained using paper books would have the same 'license'). And they don't have a license for electronic books that allows copying, either.
The librarians undoubtedly control the inside of the library, so they should be liable for allowing copying, no? That's your standard elsewhere, and there is no 'license' which gives the librarians the authority to allow copying.
Now, what actual libraries generally do is post a sign near the photocopier saying 'Hey! These are the rules about copying! Follow them!' Is that sufficient? It seems reasonable to me. On the other hand, one could say it kinda makes it seem like the librarians are aware of the risk that their services will enable copyright violations and yet choosing not to supervise the photocopier. Sneaky librarians!
But, by analogy, we could allow the LLM provider to post a notice in their Terms and Conditions: 'Our product may allow access to copyrighted information. It's your responsibility to not distribute any copyrighted information you receive from our product.' Problem solved, just in the way it's solved in libraries, right?
And note that LLMs might allow access to copyrighted information. Maybe. On the other hand, libraries absolutely allow access to copyrighted information. If the disclaimer works for library photocopiers, why in the world would it be insufficient for an LLM?
I'm intentionally dropping your point about DMCA takedowns except to note that, perhaps, you missed my suggestion that the LLM providers should also handle DMCA requests, and thus be just as fine as other providers, perhaps? If it's good everywhere else, why not in the case of LLMs?
Dang, that's wild! I had no idea that asking for, and receiving, a response from an LLM was a "gotcha" game! That's pretty surprising to me, I would have expected that a service put on the internet to answer queries was intended to answer queries given to it by the internet, but I guess that's not the case. Dang, you're teaching me so much!
Great job of setting up a strawman and knocking it down while ignoring the point. Yes, it's a gotcha game if the goal of making those queries is to bombard the provider of the LLM in DMCA takedown requests. Which, again, was the point I made very specifically, and the point which you didn't respond to while tilting vigorously at your strawman.
Strawmen: 0, julka 1. But actual arguments GW 3, julka 0 (counting the arguments you intentionally whiffed on with personal attacks).
Dang, that's wild! I had no idea that if rightsholders request content from somebody who doesn't have permission to distribute it, it's actually the rightsholder's fault if the person who doesn't have permission to distribute it gives it back to them!
I specifically said that wasn't the argument here and told you what the argument was.
Strawmen: 0, julka 2. But actual arguments GW 4, julka 0.
Can you explain to me how the LLM operator a) chooses to let the LLM search the web but also b) has no control over where on the web the LLM searches, and also can't choose whether or not the LLM searches the web?
Annnnnnnd the goalposts move hundreds of kilometers in a single bound! Before, you claimed there was no user data. Now, you're claiming that the provider could make there be no user data if they wanted to (at the cost of making the product much less useful, obviously). But, unlike the case of everything else that searches and retrieves information from the web, you want the LLM provider to be responsible, not the entity that published the copyrighted information on the web in the first place.
I was under the impression that if the operator is choosing to let the LLM search the web, then surely they should be responsible for the fact that it's searching the web, and furthermore they have some sort of ability to tell the LLM what sources are good or bad, and I would have expected that this would meant the LLM operator has absolute control over what the LLM is trained on and uses to generate its responses!
What a ludicrous argument! You're of the opinion that an LLM operator must create a canonical list of the entire internet and determine which sources are good and bad? Even Google doesn't so much as try to do that! Hence all of those DMCA takedowns you go on about. And, really, you want the LLM providers deciding what's 'good' and 'bad'? The implications of that are, honestly, horrifying.
So, your argument is that the LLM providers must undertake a task that so far outstrips the meaning of 'Herculian' as to boggle the mind - make a URL-by-URL determination, for the entire internet, as to whether a site is 'good' or 'bad' - before they can interact with the internet at all? And they have to do it even though the entire internet is constantly in flux? Perhaps you understand why I'm laughing at you. Loudly.
Strawmen: 0, julka 3. But actual arguments GW 5, julka 0.
As far as google chrome goes, you're just being a big silly - nobody made that argument! It has nothing to do with anything, you goofball! LMAO, stop making joke like that! I thought you had an actual point!
There's this thing called an analogy: 'a comparison between two things, typically for the purpose of explanation or clarification.' You might want to read up on them. By 'analogy,' you are saying that the exact thing Chrome does, and that its providers control it doing, is just fine, while if the LLM does the same thing, because its providers allow it to, it's awful and infringing. Even though, again, it's the same thing.
So, which is it? Is Chrome bad because it does the same thing the LLM does, or is the LLM fine because it does the same thing Chrome does? Pick one or the other, if you're not too tied up with learning what analogies are.
At this point, I will just assume you're completely fine with the solution to this being to allow rightsholders to issue DMCA challenges to the LLMs, since you haven't bothered to reply to it. No challenge; the LLM is fine. A challenge, to which the LLM provider doesn't take appropriate action, might be grounds for a lawsuit (with, obviously, refusal to respond to the challenge a key piece of evidence). Now, if the challenge is wrongful, the LLM provider shouldn't take action, and perhaps the entity issuing the wrongful challenge should be sanctioned, but those are details. We're good there? DMCA challenges solve the whole thing?
And I will also consider you to have conceded the point that the LLM's production of copyrighted materials is only actionable to the point where the copyrighted text is long, accurate, and has a meaningful negative impact on the marketability of the original work (all longstanding elements of Fair Use, and all adjudicated to apply to commercial use of excepts of copyrighted works), since you haven't bothered to reply to that, either. If it meets those criteria, it's actionable infringing material. If it doesn't, it's not.
See? We can make progress! Now, all you have to do is find an extant LLM which can produce actionable infringing material without resorting to external sources and - wonder of wonders - I will actually agree that the LLM / provider might be committing a legally actionable infringement. And I hope that you, in turn, will agree that any LLM which cannot do that (which seems to be 100% of the currently extant LLMs) is not committing any legally actionable infringement, nor should it be considered to be doing so unless the rightsholders have made specific takedown requests against actionable infringing material, allowed adequate time, and had those requests not eventually satisfied.
Mind you, the 'post a sign' version seems sufficient, unless you think libraries are dangerous hotbeds of copyright infringement and librarians should be held responsible if anyone walks out of their library with a copy of a book. But doing it the more complicated way seems beyond question, no?