segaloco
Experienced Member
- Joined
- Apr 30, 2023
- Messages
- 137
Yeah the scraping is a pain, that's on the industry players to enforce some sort of ethics, which of course they refuse to do. Even if there were rules though, I suspect they'd amount to a robots.txt situation where it's a suggestion that nobody materially holds anyone else accountable to.
I have opinions on the cost of standards that I could fill multiple replies with, so I'll just leave it narrowly at this: If it is fact that LLMs do not have access to a given standard governing something you're prompting it on, then imo nothing it says on the subject should be trusted. It should be held to the same standards as publications subject to rigorous peer review. Otherwise it's the same hearsay you'd get from some rando on Stack Overflow that also doesn't cite their sources. If someone wants to put forth their given AI of the day as a programming assistance tool, then it'd be nice if they demonstrate that they've ensured the tool has access to these sorts of specifications. Then it might justify the licensing terms involved in that sort of thing. Still, do you get in an Internet Archive situation where they have a finite number of licenses to the material but theoretically infinite requests to refer to the material? If the model isn't spitting back verbatim text, just analysis based on it, is that copyright friendly? That's one of the many reasons I just don't bother engaging, the legal frameworks just haven't caught up yet, but you know there are lawyers out there just ready to strike the second they smell blood.
I have opinions on the cost of standards that I could fill multiple replies with, so I'll just leave it narrowly at this: If it is fact that LLMs do not have access to a given standard governing something you're prompting it on, then imo nothing it says on the subject should be trusted. It should be held to the same standards as publications subject to rigorous peer review. Otherwise it's the same hearsay you'd get from some rando on Stack Overflow that also doesn't cite their sources. If someone wants to put forth their given AI of the day as a programming assistance tool, then it'd be nice if they demonstrate that they've ensured the tool has access to these sorts of specifications. Then it might justify the licensing terms involved in that sort of thing. Still, do you get in an Internet Archive situation where they have a finite number of licenses to the material but theoretically infinite requests to refer to the material? If the model isn't spitting back verbatim text, just analysis based on it, is that copyright friendly? That's one of the many reasons I just don't bother engaging, the legal frameworks just haven't caught up yet, but you know there are lawyers out there just ready to strike the second they smell blood.