I thought about adding this to the other thread going on about AI art, but I think a separate topic is more appropriate since that one looks to focus general opinions.
Note - This is my interpretation of how this stuff works and I could be wrong on some stuff. Feel free to point it out. I'm not a legal expert or anything.
----------------------
So basically from what I understand the copyright office ruled AI art cannot be copyrighted because only a human author can hold copyright. That means the work automatically falls under public domain. Now as for our question, if artists are free to use imagery that falls under public domain and are allowed to copyright it if their version of it is altered differently enough from the source, would that not also mean an artist can use AI art someone else generated for their own purposes?
From how I understand it when it comes to AI using art, it tends to be either these two things:
1) If the AI trains itself on data sets of art but does not generate art directly from the dataset...then arguably all its doing is "looking" at the art, studying it then creating its own based on a generalization. That is largely what human artists do. Therefore that would allow it to dodge any legally grey areas because to call
2) If it does generate art directly from the data set as in using pictures, yet alters its image just enough to differentiate it...then it would arguably be an original piece thus is copyrightable.
Either way, you'd end up with an original piece that the AI created using human art. I don't know if its fair, but I don't see it as being illegal either because to do so would mean you'd have to classify a lot of human art as also being illegal because most of what we do is, in some way or another, based off tons of art we see around us whether its a conscious effort on our part or subconscious.
For AI creators, this is a double-edged sword. They can create as much art as they want and legally use artists work as a basis, but the catch is that because they didn't create anything themselves that they don't own whatever the AI created as it falls under public domain. Thus the artists should be able to use the AI-created works as a basis for their own art.
I think this as it stands is more of a win for artists than loss as the AI gives rise to a whole new way of looking at creativity and a new source for idea generation/referencing. AI creators can create artwork from artists work, but there's also nothing stopping us from using AI art to make our own art.
Replies
1 and 2: It doesn't matter - robot work cannot be copyrighted.
Can you create derivative work from something a robot did and own the copyright?
yes, you can
it's not AI, it's a complicated filter. if we stop calling it AI we'll stop attributing human characteristics to it and we won't feel the need to perform these kinds of mental gymnastics.
the real shit is a few years off and it'll most likely wipe us out before we see it coming
Now of course it *acts* as a filter (put something in, and get something out) ; but the issue lies in the ethics and legality of the way the content was acquired in the first place. What people want to do with it is largely irrelevant.
- - - - -
"Can you create derivative work from something a robot did and own the copyright?
yes, you can "
Well, only if the practice of IP laundering gets a free pass. This legal battle is far from settled. So the answer is not "yes you can", but rather : "by doing so, one is committing IP laundering. At this time, there is no legal precedent on this topic."
ML models are literally filters so I disagree - facts are not disingenuous.
A neural net is a tree of binary choices. The complex behavior we see is achieved by brute force : ie. there's a shit load of choices and a shit load of tweakable parameters that drive the choices (that's what the prompt affects).
My point is that we as a species should not attribute intent or intelligence to the models - that muddies the waters significantly, leading to questions raised in the OP and distracts from the real point which is - as you say - the ethical responsibilities lying with those gathering training data.
In terms of the law stuff.
Provided what you do is considered transformative enough you can do this with copyrighted material - I fail to see a difference
The scraped art is (in most cases) intellectual property, and theft of intellectual property is a recognized problem. (https://www.europol.europa.eu/crime-areas-and-statistics/crime-areas/intellectual-property-crime)
Harvard Business Review has a great article explaining why this is a problem specifically about AI generators. https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem
I guess scale is the practical issue - previously it took effort to steal a load of people's IP and sell it and now the robots have done the work for you.
It's a delicate problem for legislators because a knee-jerk reaction in response to public outrage could cause a huge amount of harm - the freedom to create transformative work and commercialise it is the foundation of critcique/review/history/parody/invention in general etc.
'Art' includes all sorts of stuff outside the remit of midjourney and coming up with a definition that protects the things I mention above but prevents 'IP laundering' is going to be quite the exercise.
My concern is that we could end up with legislation that makes it impossible to build on or critique prior art - which would be pretty devastating both culturally and individually for many commentators/creators.
that aside though, enforcement is a problem I'm not sure can be properly solved.
The training data is already stored in human readable form because its pictures people got off the internet - nothing needs to be done there.
The result of the training is not human readable and (to my admittedly slightly out of date knowledge) you cannot reverse the process to derive the training data. The closest we've got is using another model to guess what the sources are - this is of course not reliable.
The upshot of this is that you have no way to prove that the presented training data is or is not what was used to train the model - a bit of an issue in court I think.
In practice, generating a decent amount of training data is a shit load of work and will be out of the reach of most organisations - they'll be buying datasets in and it should be possible to regulate the people selling them (in as much as you can regulate anything)