Fun with AI-generated images
At a glance…
Having got put onto a site that uses advanced AI to produce the most amazing images from simple descriptions that you enter, the Author thought it would make sense both to draw attention to that 'gold dust' service and to share a small selection of the best images from there he's so far come up with.
He also points to the ethical issues involved, and points to a sensible pragmatic approach rather than a rigid, paranoid one. However, even that pragmatic approach requires understanding of a fundamental serious harmfulness of AI when used as any sort of substitute for individual creativity rather than as an extension of it. In Humanity's grossly dysfunctional state here on Earth at the current time, it's absolutely a given that the hugely vast majority lacks such discernment and would use AI-generated images for all the wrong reasons, and further destroy themselves as they've already been doing by regular TV watching and using smartphones as a substitute for getting a real (intelligent!) life.…
All rights reserved.
The basic info
In issue 645 of ComputerActive magazine here in the UK, which reached me in late November 2022, my attention was drawn to a system developed by the folk at OpenAI, which enables one to generate a huge range of image options / variations generated in response to a simple typed-in verbal description.
Initially I was internally scoffing at the thought of attempts to produce worthwhile images that way, except perhaps for some specialist purposes — but what really woke me up to its potential was a hilarious reader-submitted image using that system, of Father Christmas painted by Van Gogh.
I've started experimenting myself, as I'd like some really wacky creative and usually surreal images that could be good options in presentations of my own creative works — maybe in certain cases a new cover design for certain of my novels — especially as I've always had a certain manual clumsiness that makes me rather a dead loss for drawing / painting. I also rather wanted somehow to produce a creative image to accompany any promotions of performance / recording of my organ and tuba composition The Unknown, which I understand is expected to get a public performance and live recording before very long.
In the case of the image reproduced above, I doubt whether that one would really be usable for the particular novel, The Hunting-Down of Michael Maus, but I find it impressive in its own right. Indeed most likely I'd be able to get images that could be used for that novel, simply by being more specific in descriptions that I submit. That title in itself is an enormous ask of such an AI system, and I'm greatly impressed that it responded so cogently to such a creative challenge — and so convincingly and indeed hilariously to mimic Salvador Dalí's style and, shall we say, rather dubious ethos.
The AI system used for this is called DALL-E 2. On the OpenAI site the system is presented as a freemium commercial online service. To use the service, first you have to sign up to create a free account. Currently you get 15 free credits per month, and can buy additional credits at any time, at the rate of US $15 per 115 credits (not sure whether that's in addition to the 15 free credits per month). At the time of writing this, I've used up my first 15 free credits and have to wait for 9 January before I can use my next 15, but I did get quite a lot of mileage out of those 15.
When working in a more focused manner rather than just trying the system out, I'd expect to get a reasonable amount of good work from 15 credits, but of course would then have to wait before I had more free credits.
One limitation I found is that the system can be a bit nannyish. It's good that it's programmed to refuse to generate certain harmful / unwholesome material, but it did seem a bit excessive that it refused to process A dog cocking its leg on a gatepost
, and the title of my fourth novel, Still Life With Strangled Porcupines
, presumably because of the 'strangled' bit (WTF??!).
Yes — and when I tried that without that 's' word, all the images produced showed hedgehogs, not porcupines! A pity I couldn't find any means to tell the developers that a porcupine isn't a hedgehog at all, and those two types of mammal are biologically not closely related, either!
Another current limitation is the image resolution, which is only 1024x1024px. I did find, however, that my photo editor, Photoshop Elements 2021, does a good job of up-scaling the photos. Inevitably there would be some quality loss when a small image is enlarged — particularly as all unsharpness gets magnified.
Some online research showed the current version of Photoshop (not the simplified Photoshop Elements) is widely regarded as giving the best available preservation of image quality when enlarged (using its 'preserve details' option), but I don't have resources for such expensive software. I did try a particularly highly regarded free online image up-scaling service (enlarged 400%), and compared its results with what I got in Photoshop Elements, and I have to say, they were almost identical but Photoshop Elements won by a tiny margin (very slightly less unsharp). In both cases pixelation was not noticeable when those enlarged images were viewed full-size, even on straight edges at a 45° angle.
A little annoyance was that each image had on it, at extreme bottom right, a little strip of five colours, which was clearly nothing to do with the image content. I've had to use Photoshop Elements' excellent spot healing tool to get rid of them (using the 'proximity match' option).
I suspect (though don't actually know) that that little colour strip is put there deliberately to encourage people to buy credits. Surely paid-for generated images wouldn't have that little spoiler on them…?
Are there any downsides of using such an AI system?
Yes, an ethical one. However brilliant it is, it inevitably depends on having 'scraped' an enormous number and range of already created artistic works and people's photos, a high proportion of which were or still are in copyright, and it's liable at times to reproduce extant copyright works or very close variants of them, giving people the impression that by using those they aren't violating the original artists' work. This is all a potential legal minefield. Unfortunately probably the majority of people would have no scruples about any copyright infringements they've carried out through such a third party as OpenAI.
However, there's so much to gain from such systems that the real point is that the whole arena of the rights of visual artists needs rethinking, much in the way that has had to happen with regard to mass-distributed and copied music, so that means can be found to compensate original artists for use of their work in AI systems.
In the meantime, my approach to mitigating such concerns with regard to my own use of AI-generated images is a pragmatic one, based in a sensible rather than paranoid social responsibility:
-
Not to sell them, though I can't rule out the odd one being used as part of some saleable item — but then I'm not a businessman and am not seeking to make significant money out of them, either directly or (generally) indirectly.
-
Always to credit the images, mentioning OpenAI and the source artist to whatever extent the source of particular images or styles is known or assumed, so people are given the opportunity to understand that although the images could be said to be my own intellectual property, they would be so only in a very minor sense.
By making money out of them, in some cases people may be flagrantly violating an original artist's copyright, in spirit if not in the convolutions of current copyright law relating to AI-generated images, even though I myself have the good sense not to make copyright claims, actual or implied, relating to such material.
- Out of respect to artists whose style I'm using, if an image is generated that closely resembles a specific extant work of the respective artist, and the work were still in copyright, then I'd not use it, at least for any major public use, or where it could be expected to benefit me significantly financially.
And a final word on that subject — we need to keep in mind that every creative artist, whether or not they're aware of it or at least admit to it, draw at least to some extent from the works of others in their own works, albeit with a fair amount of 'distillation' occurring in the case of the real quality artists. Indeed, I'm fairly sure that certain of my own music compositions are drawn at least in part almost verbatim from certain other composers' work — though in this case the earlier composers were apparently in prior planetary civilizations, not this one, and so no rights are being violated even in the event a precisely accurate reconstruction!
For more about the origination of my music works, please see Musical Influences on Philip Goddard's Music & Literary Works.
… However, initially I soon lost interest in using AI for anything creative, except for the odd special purposes. The point is that, however effective it is, its applications tend to be basically utilitarian rather than fully humanly creative.
That may sound to be rather an academic point, but actually it's much more than that, and a fundamentally serious matter. If you regularly use AI to generate supposed 'art works' of any kind, or even simple everyday text, what you would be doing in some measure is training your brain not to use your own intrinsic creativity and individuality. Ongoing frequent use of it therefore could be every bit as harmfully diminishing as sitting back and watching TV (or equivalent) rather than getting out there and living a genuine authentic life of your own (sans vicariousness, perish the thought!). By such use of it, you could be degrading and shutting down some of your most precious and fundamental humanity. — Is that really what you want?
However, during January 2024 I did find a use for it, having learnt that a major new version of DALL-E, which has become freely available as Bing Image Creator (see further below). It has turned out to be really useful for me, to start providing the YouTube videos of my Nature-Symphonies each with a background artistic image. Yes, not all the creativity in those is my own, but a significant proportion is. After all, the text I use to get Image Creator to do its now really impressive work is thought out on the basis of a deep creative involvement with the respective Nature-Symphony, and of course my choice each time out of the alternatives offered is again based in my own depth of awareness and creativity. It's enabling me to extend my own ability, and without passing-off anything that isn't my own as my own.
A little exploration
Let's start, then, with a few further samples of what I tried with my first 50 free credits.
Magritte-style image titled The Awful Destiny of Physalia Gorgon
Again, this was a crazy-big ask for any AI system to handle, without some very specific programming / training, so I was simply curious to see what DALL-E would come up with when faced with such an outlandish challenge. The Awful Destiny of Physalia Gorgon is in fact the title of my third novel.
The name Physalia Gorgon has a pile of connections. Physalia is the generic name of a dangerous rather jellyfish-like floating colonial marine organism known as the Portuguese man o' war (Physalia physalis). The surname Gorgon also has an interesting connection, particularly regarding Medusa the Gorgon in Greek mythology, the sight of whose face would allegedly turn anyone to stone. Also, in biological terminology, a medusa (non-capitalized) is the technical term for what people usually call a jellyfish. — So clearly I'd put a lot of insight and thought into the name of the hapless individual concerned.
In the event, I reckon DALL-E didn't get the Gorgon bit, but latched onto the biological bit and of course the cryptic surrealist Magritte style. I could almost use it, except that it completely leaves poor Physalia Gorgon himself out in the cold!
To get a stylized impression of Physalia Gorgon pictured by DALL-E (and maybe every subsequent visitor to this page thus be turned to stone) I shall try something like Male teenager covered in fur and with prominent canine teeth
, or Male teenager with the werewolf syndrome
, and likely specifying that he's crying. Also I'd see what DALL-E makes of my specifying that the hair on his scalp is composed of snakes! — Indeed, I might even be able to combine that unfortunate individual into the above image in Photoshop Elements in some useful way, such as maybe putting him in that framed inset.
In fact, in the novel Physalia has just fur, not snakes, but the image that I'm after is more about the popular image that might be conjured up by his name — for in the novel he is widely regarded as some sort of unspeakably hideous monster.
Picasso-style picture of tabby cat at foot of moonlit cross
DALL-E's attempts at that subject didn't come out effectively for my purpose, until I specified Picasso-style, when just one example was usable, though still not quite what I was really after.
I got that idea from the light-heartedly desolate Epilogue in my sixth novel, Forbidden Flood Warning. For a year or two now (in late 2022) I've been feeling a strong urge towards producing at least one creative work (presumably poem or/and piece of music) entitled Jeoffry the Cat's Seven Last Meows At the Foot Of a Moonlit Cross. — For me it has such an 'energy' and 'poetry' about it!
In November 2023 I created (Nature-Symphony 19 (Crazy sad dance — Jeoffry the cat's seven last meows at the foot of a moonlit cross)).
Jeoffry was the joint central character in that novel, and indeed was a tabby cat.
I love this image, and could use it — but if used with any intent to reflect the spirit of the Epilogue in that novel, almost certainly I'd remove the birds. Poor Jeoffry is left with just moonlight and memories of all that Humanity had thrown away in their mindless follies. He forever waits for his now dead and dismembered great friend and companion Geronwy to return, or some now nonexistent bird or spider he could chase and even grab and eat, and so is whiling his time by meowing his heart out!Still, I expect I can get something closer to my original intention — probably by getting DALL-E to produce the elements separately — such as A tabby cat meowing at an invisible moon
, and A full-size tall moonlit cross
, and then combining them in my photo editor. Also different styles could be particularly effective; I wonder for example about Edvard Munch, renowned for his painting The Scream! Indeed, for a laugh I'll undoubtedly try The Meow, in the style of Edvard Munch
!
Another style I'd want to try for poor Jeoffry's seven last meows is that of the particularly relevant artist, the Italian Giorgio de Chirico, adaptations of certain of whose paintings are described in the novel as part of the latter's weird but compelling structuring.
Magritte-style image of large cathedral organ with tuba small in foreground
I wanted one or more such images to be available for presentations of my music composition The Unknown, for organ and tuba, which I understand is due for a performance and recording before long. This was actually a devil to get even anything near right. This was the best out of a lot of sort-of interesting but rather wide throws, most of which could have their uses for somebody else, but were not good representations of what I was describing.
Although I ended up specifying that the foreground tuba is 'entire', the example below was the most convincing choice I had. Although it doesn't show much of the organ, at least here we're clearly looking at the bottom of very big pipes that would be towering over us. Although Magritte's iconic 'faceless' human forms were not really wanted here, in a way they work out rather effectively in relation to the composition's title. This image really does give the sense of being in a really large cathedral, even though we see hardly any of it, and with an uneasy sense of something 'unknown'.
I'd also like to have an image of a rather misty tuba suspended in the air above the island of Lundy, for the work starts with an imaginative nature painting of an experience I had on that island in 1979, which is recounted in my programme note for The Unknown. I wonder if DALL-E knows anything about Lundy — maybe it'll give me a great bunch of laundry images!
Using Bing Image Creator (DALL-E 3) in my further YouTube videos
Forget DALL-E 2! The new major version is truly amazing, even though inevitably it currently gets some things wrong. Image quality is far superior now, and has advanced from 'interesting toy' status to a real professional-grade tool. The only indication for not regarding it as 'pro' quality yet is the availability of only one pretty small image size, but it's great for images that are going to be used on web pages.
I started using that service for a background image for my Nature-Symphony 25, and have used it to enable me to have a background image for each successive one. See the whole list of my Nature-Symphonies, remembering that these images start with No. 25.
One regret is that I didn't get onto this in time for Nature-Symphony 19 (Crazy sad dance — Jeoffry the cat's seven last meows at the foot of a moonlit cross), for I got some quite 'masterful' results for that title offered to Image Creator. How about these three tries?…
As already noted, one practical limitation of these AI images is that currently they're all square, 1024x1024 pixels. To use them as background images for my videos they need to have a 16:9 aspect ratio, such as 1280x720px. While these images are of high enough quality to upscale very well indeed in current / recent versions of Photoshop Elements, the composition of most of them makes good use of the height, so that only a limited proportion of the images are even potentially usable for background images for my videos. Even where the cropped image does fit my purpose well, it's still generally far inferior to the full, uncropped composition.