I wanted to share my concern with you. You all have heard about midjourney, which is extremly hyped. The discord server is insanely crowed and the results of people using the AI are becoming crazier every day. Very realistic results, stylized and anything in between. The AI is so sophisticated that it creates amazing and beautiful artwork following all the common principles that pleases the eye (it uses alot of complementary color palettes for example)
Convolutional Neural Networks
In case you dont know, midjourney is a convolutional neural network, a neural network that works especially well for image generation. And these networks are learning and evolving through many iterations of creating images.
To train a neural network to become this sophisticated, the dataset must be MASSIVE. It probably includes MILLIONS of images.
The datasets are the most crucial for this. Id say 90% of the "amazingness" of midjourney comes from their choice of the dataset. So... they dont just have any dataset of random images, but they must have datasets of artwork.
I have my concerns about midjourney and their use of artwork. Ive been asking the devs which datasets they used for developing their network. But they have been intransparent (its a business secret I suppose). They only mentioned a famous image dataset, one of the largest free available ones and stated that other "private" datasets were used aswell.
Where did they get those artworks from?
They for sure did not asked or paid 1 million artists to use their artwork.
What they do is essentially mimicing the styles and creations of other artists.
They do not copy art
It is true however, that the art Midjourney generates is unique. It generates the images from the KNOWLEDGE it gained from the artworks in the dataset. It is still a grayzone to me and it raises questions about where intellectual property of art ends and where interpretation begins.
It becomes apparent, that they used alot of artists works to train their AI. It even happens that signatures appear, but its never readable. (Midjourney is not really capable of using text in the results)
Assumption 1 - Data Mining
My assumption is, that the devs used data mining techniques (bots will crawl internet pages and download content in reusable and saveable formats. Mostly used to scrape data and save it into a database to create datasets).
This means, they have used sources like deviantart, artstation or forums like polycount, to scrape the artists works and create a dataset full of amazing art.
There is no way, that they could have achieved this level of sophistication and artistic skill with an AI otherwise.
Assumption 2 - Investor bought them a dataset
Another possibility would be that they had an investor who helped them buy a huge dataset of a company like Epic (artstation) or deviant art (owned by wix). This would not make much of a difference in terms of artists right on their own work, but it would be a legal safe call for them, since these platforms probably are capable of using massive anonymized datasets of the artworks uploaded to their platform.
My mind is racing these days. I love midjourney, its incredibly fun. It is a nice break from programming or creating materials substance designer which are both activities with much longer times until you see results. Creating art in midjourney is fun, fast and it just works incredibly well.
But my concerns about artists right rise and since there is no transparency about it, my gut says that something is going on.
Let me know your thoughts in the comments.
Here are some of the results from a few weeks of using midjourney: