AI Slop

I occasionally get some comments and/or questions about why I have abandoned the “old ways” of Visions of Chaos for AI. I wanted to clarify a few things and have a post to point the next person who complains to.

Ever since I started coding Visions of Chaos, even in the days before I even released it to the public, users will have their favorite modes and be disappointed when I stop coding that feature and move on to the next thing.

If you are of the “AI Slop” crowd and have a general dislike of AI, then you see that Visions of Chaos has hundreds of machine learning modes and you are not happy. You want me to work on the older modes, be that cellular automata, fractals, shaders, reaction diffusion, etc.

Personally I find AI fascinating. The majority of my Patrons are mainly interested in the machine learning modes too. If you do have a capable GPU on your PC then I really recommend you try all of the available machine learning functionality to get an idea of what is possible.

Do not think of machine learning and AI as the one new feature in Visions of Chaos. Rather think of it as a back end to allow hundreds of interesting new modes and scripts to run inside Visions of Chaos.

If you really still have that AI slop dislike then I have now added an install option to not install any of the machine learning if you want to. If you cannot ignore that Machine Learning mode menu you can now install Visions of Chaos without ever having to see any mention of AI or machine learning.

For the rest of you, I am sure you are looking forward to the future advancements in the AI models like I am.

Jason.

Local Vibe Coding

Local Vibe Coding

Vibe Coding” is a new term that means you create code by describing what you want in normal English. You explain to an AI what the “vibe” is you are going for and wait for the AI to do the coding for you.

For a while the only was to try vibe coding was to use one of the large cloud based paywalled AI systems. Recently I have been experimenting with some locally runnable code models that will work on a 24 GB consumer level GPU (4090 or 5090). Visions of Chaos has always been about getting all of these AI systems running locally without needing any paid for cloud subscriptions.

llama.cpp to the rescue

I have added support for llama.cpp into the latest versions of Visions of Chaos. llama.cpp runs various LLM models locally on your GPU.

The first model I tried in llama.cpp was gpt-oss:20b. This model is one of the newer “chain of thought” models. This means when you have asked it a question the AI does multiple thinking passes to process your prompt more deeply. In this way it can (in theory) give better quality answers to your prompts.

Testing The Coder Capabilities

For the following tests I asked gpt-oss:20b to create single html file apps. This way it can generate a self contained javascript program in an html page that anyone can easily open in a browser to see.

Clocks Something simple to start. “write me an html page. the contents of the page should be a clock that shows the current hour, minute and seconds in a hand drawn style. there can also be a hand drawn classic clock with hour and minute hands to also show the time that way.”

7 Segment Display Clock A 7 segment display. This took a few extra prompts for appearance, but got there in the end. Notice how the segments have that slight fade out at the edges like a real clock does from the LED fading out behind each segment.

Tic-Tac-Toe Something with simple smarts. “write me an html page that plays a game of tic tac toe. the computer should play strategic moves and not just random choices. have a restart button when the game ends. keep track of the scores for win, lose and draw games.”

Reversi More complex. The AI stated that the black strategy was to always make the move that takes the most pieces, but it still beat me just with that. Only this prompt. “write me an html page. it should be the game reversi or othello. the computer should be the black player. who goes first should be random.”

Reversi another Reversi from the Qwen3.6-35b model.

Connect Four and a nicer version generated with Gemma4:26b Connect Four I used to be really good at Connect Four and could only be beaten by other pro players. Nothing to be proud of as it was just the result of way too many games while drinking way too much at the pub. This AI result plays well and beat me. Although it has been years since I played seriously. The AI coded a very good strategy. “write me an html page. it should be a game of connect four with the human vs the computer. the computer should use strategies to beat the average player.”

Maze How about a maze generator. This one had a few extra prompts to change colors and layouts.
“write me an html page that creates and solves mazes. size the maze to fit the window. give options for how many pixels each maze cell is.”

Traffic Dodger A simple car game.

Neon Breakout Block breaker game.

Starship Defender R-Type horizontal shooter clone. Try the extreme difficulty if you find it too easy.

Tank Wars An AI update of my old TankWar game.

Retro Worm Game The classic worm eats the food and grows game.

Sudoku AI got this working first go. A few extra tweaks helped get it to this stage.

Mandelbrot Zoomer A realtime zoomer into the Mandelbrot set. Click and hold the point you want to zoom into. Limited precision so it does not go too deep.

2D Gravity Gravity simulator that handles 25,000 without a slow down.

If you want to see the code the AI generated for any of the above, open them, right-click and view source, or right-click the links and download the html file.

Using AI as a Code Reviewer

My next test was to give it some of my old code and see if it can help. I gave it my complete code for 2D SPH (Smoothed Particle Hydrodynamics) fluid simulations and asked if it could review the code and recommend any ways I could speed up the code.

It went off “thinking” for a few minutes and then gave me a surprisingly detailed result with ideas from simple code tweaks to reorganizing how data was aligned in memory.

Using just the simple tweaks got the code speed down from approx 13 seconds per frame to 5 seconds per frame. Those fixes were very basic “derrr, I should have seen that” type fixes, but this is code that I have spent a lot of time going over trying to improve the speed of in the past. 3 minutes with the AI and it is now almost 3 times as fast.

I did try giving the AI full control and asked it to make all the changes it thought would help. The new rewrite of the code was a mess that had more compiler errors than help.

I think when using these coder systems locally you want to minimize the code it is working on or ask it exact specifics as you go. If you are a coder already it helps. Then you can ask it more specific questions like “I have traced the slowest performance of this code to the DoTheLongHardParts function. Can you look at it and see what you can suggest to improve speeds”?

Solving My Decade Old Problem

This is where it really convinced me it was helpful. For many years now I have on and off tried to solve the problem of searching for interesting cellular automata. See this post for all the details. Once I had the gpt-oss:20b model setup I asked it to code me a search utility for “interesting”. After a few hours over two afternoons it had coded two javascript apps for 2D and 3D CA searching that do find interesting rules. When you have spent 10 years on an issue and AI solves it in a few hours it really is mind blowing and shows how that when it works it can be very useful.

Updating Web Design

For a while now I wanted to make some changes to my website. Specifically I wanted these expandable/collapsible tables that the Visions of Chaos page now has.

Visions of Chaos

Before the collapsible tables the user had to scroll down a long way to get to the download link as there were so many modes and user comments listed.

With only a few passes of my CSS layout with GPT-OSS:20B I had the tables I wanted collapsible, the new blue colors, and the more rounded tables. I did still need to do a few manual tweaks, so it does help if you know the syntax the AI is writing.

It Is Not Perfect

Vibe coding is not perfect. Especially when running on the limited VRAM of a local GPU, but it is amazing for what it can do. The way it can get 95% of the app I ask for and not have a single syntax error is amazing. It may need a few extra prompts to tidy up issues and bugs.

But, like all AI, it will never say “sorry, I don’t know what that is”. Rather it assures you it can help and happily hallucinates code that works, but misses the point of what it is supposed to do.

Try It Yourself

If you have a capable 24 GB GPU (3090, 4090 or 5090) then I highly recommend you try it. It is a virtual zero cost coder you can get advice from at any time.

Download Visions of Chaos and install the llama.cpp mode. Start with the gpt-oss-20b or the gemma4-26b model. Ask it whatever you like. Feed it some code or text to review. Get it to write you a simple or not so simple game.

A tip for getting started. JavaScript apps in a single html file are good to get used to vibe coding. Try using text like the following to prompt…

Create me a javascript app in a single html file.
It should -describe what you want it to code here-
Include a debug ability that will catch and display errors or crashes as text so I can copy/paste the errors to help debugging.

For the description, be as explicit as you can. If you want your app to have a specific look, describe it. If you want some other features describe them. You can always start with a very vague description and reprompt to improve the results, but if you spend a few minutes describing what you want it can get there quicker.

For more complex apps then the request for debug text can really help. JavaScript apps will just hang when they have an error. That does not really help the AI fix the problem. “it crashes when I click the XYZ button” is not as helpful as being able to copy/paste a stack trace for the AI to see exactly what went wrong.

Jason.

Giving AI Total Creative Control For Short Movies

Visions of Chaos now supports setting up llama.cpp. llama.cpp supports Gemma4:26b. Gemma4:26b is one of the “thinking” chain of thought models and gives impressive results. I wanted to test how well it could help write scripts for short movies based on a simple prompt. These are the results.

I had just finished watching The Holy Mountain so I wanted to try a more surreal movie. Using the Gemma4:26b model in llama.cpp I prompted it to create the prompts for 20 scenes in a surreal movie inspired by Jodorowsky and Lynch. Like Lynch I will keep the story or the point of the film a secret. Make your own conclusions, but this movie is the result of rendering the AI prompts using LTX 2.3. Gemma also created the prompts for the music that ACE-Step-xl-1.5 generated.

Next up was a qatsi style movie. I thought this was the perfect use for these AI Text-to-Video short movies. The prompt was “i want to make a new text-to-video movie in the style of the qatsi movies. 30 scenes with an overarching structure to them like Koyaanisqatsi or Baraka. can you help?” That was basically it. As the human in this creation my job was to render batches of each prompt looking for the best ones to add to the final movie.

Lastly, a more arty Sci-Fi short. The prompt this time was “i want you to help write a sci-fi text to video movie. the theme should be how humanity is destroyed by AI. starting with pre-AI times, going through the AI advancement and finally the destruction of humanity. 30 scenes of 10 second length each. the visuals should be a more arty sci-fi movie like Solaris or 2001 and not a flashy sci-fi like Star Wars.”

For this one I did have to tweak prompts more and ask the AI to change prompts as some of them either did not suit the movie or LTX had problems creating a good result from the prompt, but overall this would still be 90% of just the AI. With Gemma, you can just ask “can you change the xyz prompt for me?” and it will happily come up with a few alternatives. One prompt it originally wrote was a basic sunset I changed and one prompt was for swings in a playground swinging without anyone on them, but no matter how many times I tried LTX was not able to make normal looking swings so that needed to be changed.

All of those were generated 100% locally on a 4090 GPU. No cloud based subscriptions needed.

I was really impressed with how relatively easy this was. LTX 2.3 is the only “problem” here as it does still generate messy outputs or has trouble with certain prompts. You can see from the movies above it does a good job when it works (it is by far the best local only Text-to-Video system at the time of writing and it handles HD resolution 10 second movies easily), but the need to render large batches of a prompt waiting for that suitable result took some time. I will revisit this when LTX improves or the next great local Text-to-Video system is released.

Although these tests were following what the AI wanted as closely as possible, it could easily be used for tweaking existing ideas or giving a bunch of ideas for new movies if you have writers block.

Jason.

Text-to-Image Summary – Part 9

This is Part 9. There is also Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 and Part 8.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.

This will be the last of these Text-to-Image Summary posts. Beyond these models all of the newer models are about the same quality results.


AuraFlow v0.1
Author: Original script by FAL
Original script: https://huggingface.co/fal
Time for 1024×1024 on a 4090: 53 seconds
Description: A new model. Very early stages and needs much more training. The following samples are cherry picked best results from a random prompt batch run.

a bronze sculpture of an evil alien 4K photo and hyperrealistic

a lush rainforest

a palace trending on pixiv and lens flare

an expressionist painting of a waterfall

an eyeball

computer generated imagery of dinosaurs

decoupage of a school of tropical fish by Daniel Maclise and Ram Chandra Shukla

digital art of an atoll

graphite pencil of a lion

metalpoint of an ocean


AuraFlow v0.2
Author: Original script by FAL
Original script: https://huggingface.co/fal
Time for 1024×1024 on a 4090: 53 seconds
Description: Updated AuraFlow model. Still being trained. The following samples are cherry picked best results from a random prompt batch run.

a cinematic painting of a beautiful woman by Anton Möller and János Saxon-Szász

a comic book panel of a large waterfall

a cute monster made of crystals and copper CryEngine and #film

a low poly render of a sea monster

a pop art painting of Harry Potter made of wood and liquid metal

a watercolor painting of Teletubbies

an archipelago by Phil Foglio and Yan Hui Flickr and rendered in Cinema4D

Dracula made of chrome and lego IMAX and hyperrealistic

humans vivid colors and IMAX

psychedelic of an eyeball made of wrought iron and vines


Flux
Author: Original script by Black Forest Labs
Original script: https://huggingface.co/black-forest-labs/FLUX.1-schnell
Time for 1024×1024 on a 4090: 25 seconds
Description: Handles text well and manages different sized images without duplicating subjects like other models can tend to do.

a cinematic painting of a cityscape

a cute creature for sale on Facebook Marketplace and psychedelic

a sea made of flowers and wood

an android filmic and psychedelic

an angry man made of liquid metal and plastic

an animation of a happy family

an attractive woman

an ugly monster

Harry Potter

The Grinch


Kolors
Author: Original script by Kolors Team
Original script: https://github.com/Kwai-Kolors/Kolors
Time for 1024×1024 on a 4090: 20 seconds
Description: Great results, very fast.

a beautiful woman

a cute girl CGSociety and filmic

a frog

a nightmare creature

a rough seascape

a shack CGSociety and 4K HD realism

an alien

an impressionist painting of the tropics

an ugly creature

kittens CryEngine and psychedelic


AuraFlow v0.3
Author: Original script by FAL
Original script: https://huggingface.co/fal
Time for 1024×1024 on a 4090: 54 seconds
Description: Third update to the AuraFlow model.

a flemish baroque of Miss Piggy by Laurits Tuxen and George Reid

a mountain path

a robot

a sea monster

an eyeball trending on Flickr and trending on ArtStation

an octopus

an ugly creature by Andre de Krayewski and Aleksander Kobzdej rendered in unreal engine and rendered in Cinema4D

an ugly woman

pixel art of Godzilla hyperrealistic and hyperrealistic

stencil of a portrait of a young boy


Stable Diffusion v3.5
Author: Original script by Stability AI
Original script: https://huggingface.co/stabilityai/stable-diffusion-3.5-large
Time for 1024×1024 on a 4090: 40 seconds
Description: Seems to create a lot of grid/dot/texture artefacts on generated images. Not too impressive compared to their earlier models.

a cubist painting of kittens

a sorceress trending on Flickr and rendered in unreal engine

a zombie rendered in Cinema4D and Flickr

abstract expressionism of a demon

an alien

an expressionist painting of a house

eyeballs by Ruth Bernhard and John Matson

Freddy Krueger

ink wash painting of a happy child

steampunk art of Teletubbies


CogView3
Author: Original script by THUDM
Original script: https://github.com/THUDM/CogView4
Time for 1024×1024 on a 4090: 42 seconds
Description: Works OK. Nothing spectacular.

a bronze sculpture of Frankenstein

a cubist painting of humans

a lush rainforest by Katsuhiro Otomo and Jörg Immendorff

a renaissance painting of a werewolf

a watercolor painting of the Sydney Harbour Bridge

an art deco painting of a knight

an ugly creature

pixel art of an attractive man

poster art of a castle

stencil of an ugly woman made of vines and liquid metal 8K 3D and 4K HD realism


CogView4
Author: Original script by THUDM
Original script: https://github.com/THUDM/CogView4
Time for 1024×1024 on a 4090: 1 minute 17 seconds
Description: Updated version 4 of the CogView model.

a fantasy land

a hyperrealistic painting of a rough seascape

a japanese painting of a cute monster

a Pixar character

a tardigrade hyperrealistic and trending on pixiv

an abstract painting of a tundra lens flare and trending on pixiv

an ugly face

egyptian art of a forest path

flesh made of silver and plastic by Elena Guro and Robert Lee Eskridge

screen printing of reflective spheres made of metal and wrought iron CryEngine and trending on ArtStation


HiDream-I1-nf4
Author: Original script by hykilpikonna
Original script: https://github.com/hykilpikonna/HiDream-I1-nf4
Time for 1024×1024 on a 4090: 1 minute 35 seconds
Description: Provides dev, full and fast model options. Full is the slowest but worth the wait for the superior results.

a cute girl

a forest path

a happy clown

a mill

a mountain Range

a movie monster made of vines and plastic vivid colors and 8K 3D

a sorcerer

an octopus

poster art of a lion made of cardboard and vines

poster art of a man


Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

AnimateDiff Prompt Travel

AnimateDiff

One of the recent additions I added to Visions of Chaos was AnimateDiff. Git repository is here if you want to see the code or more info. AnimateDiff generates short 2 second movies at 8 frames per second, 16 frames total. This is due to the model being trained on a bunch of movies that were only 2 seconds long. The generation process takes around 1m13s per movie on a 4090 and uses around 15.3 GB of GPU VRAM.

You can see some cherry picked results here. Not every result you try will be that good. I include a batch button in Visions of Chaos so you can run the same prompt over multiple random seeds to generate a bunch of outputs on the current prompt. That way you can come back after a while and look for the best result.

The first question people ask (or at least everyone who tried it in Visions of Chaos did) was “how do I make longer and larger sized videos”?

AnimateDiff Prompt Travel

In comes AnimateDiff Prompt Travel. The dev worked out how to merge the shorter 2 second clips into longer movies and it handles larger resolutions too. For the simplest usage you give a list of frame numbers and text prompts and the script does the rest. This script takes around 12.8 GB VRAM when running.

The settings for both of those movies are included with Visions of Chaos so you can create them on your own PC and tweak the prompts to anything else.

Art Games

On the Softology Discord there is the art-games channel. The purpose of the channel is for users to take another user’s Text-to-Image output, change up to two words and post the new image. This continues on and slowly evolves new subjects of the images. I took a bunch of these prompts and ran them through AnimateDiff Prompt Travel. This is the result.

These are the prompts that change every 4 seconds of movie time.

Laughing watching collapsing scifi insanity
Laughing dog watching collapsing house scifi insanity
Laughing astronaut dog watching collapsing house planet scifi insanity
Laughing astronaut above collapsing planet scifi insanity
tranquil astronaut above futuristic planet scifi insanity
tranquil dragon above futuristic planet steampunk insanity
Tranquil dragon above futuristic planet
xenomorph butterfly above futuristic city
garden airships above futuristic city
Garden spheres above futuristic city
psychedelic_spheres_above_futuristic_sky
glass spheres above a sky
glass sphere containing a galaxy
glass spheres containing a galaxy
glass spheres containing a cute creature
glass cage containing a cute spider
glass nicholas cage containing a cute spider monkey
glass wonderland landscape containing a cute spider monkey
glass wonderland landscape containing a cute spider astronaut
wonderland landscape containing a cute alien robot
Wonderland landscape containing a cute quokka
Wonderland landscape burning a cute scarecrow
large crow burning a cute scarecrow
large crow burning a cute scarecrow on halloween
large dragon burning a cute castle on halloween
Large dragon burning a Spring castle on grass
Large dragon eating a Spring roll on grass
pixar dragon eating a Spring roll on cgsociety
pixar mouse eating a Spring salad on cgsociety
beksinski mouse eating a decaying salad on cgsociety
disney mouse eating a decaying franchise on cgsociety
disney mouse driving a decaying car on cgsociety
humanoid mouse driving a cyberpunk car on cgsociety
humanoid tree driving a cyberpunk car on mars
Humanoid tree driving a green car on asphalt
bonsai tree in a green car on asphalt
rainbow tree in a green pot on asphalt
rainbow unicorn melting in a green pot on asphalt
rainbow unicorn marshmallow melting in a green pot on asphalt instagram
rainbow robot, marshmallow smoking in a green pot on instagram
Handsome robot, marshmallow bouncing in a green pot on instagram
Rainbow Robot wearing a jingasa smoking a green pot, Artwork, Golden Hour, High Contrast, 3D, Feng Shui, volumetric Light, Iridescent, Brushed Aluminum
Handsome marshmallow robot bouncing in a green suit on a piano
Bionic marshmallow robot in a green suit stomping on a piano
Bionic elephant robot in a green suit stomping on a bridge
Bionic elephant robot in a green tuxedo dancing on a bridge
bulbuous elephant male in a green tuxedo dancing on a bridge
bulbous knight male in a green armor dancing on a bridge
bulbous knight male in green armor dancing on a tank
warcraft knight male in green armor driving on a tank
warcraft knight male in green armor driving on a tesla
warcraft knight male in green armor driving on a tesla
warcraft pig male in shiny armor driving on a tesla
warcraft pig male in shiny armor flying on a dragon
warcraft pig male in shiny armor fighting a dragon
warcraft pig female in shiny armor fighting a big dragon
warcraft pig male in shiny armor fighting a dragon
lego pig female in shiny armor fighting a big pumpkin
lego cat female in black armor fighting a big pumpkin
weird cat female in black armor inside a big pumpkin
steampunk cat female in black armor inside a big warehouse
steampunk cat female in black armor inside a rich bank
fat cat aristocrat in black armor inside a rich bank
fat cat aristocrat in black armor inside a rich bank
fat cat aristocrat wombat in matte black armor inside a rich bank
fat cat aristocrat wombat in matte black suit inside a robbed bank
fat cat aristocrat wombat in matte black suit driving a robbed Cadillac
fat cat shooting rat in matte black suit driving a robbed Cadillac
fat cat shooting rat in matte black suit driving a futuristic Cadillac
fat cat rat in matte black space suit driving a futuristic Cadillac
fat cat in matte black space suit driving a futuristic Cadillac in disney
fat cat in matte black space suit driving a futuristic Cadillac book in disney
fat cat in matte black space suit driving a futuristic book in disney France
fat cat in matte black space suit driving a book in renaissance france
cat in matte black hat driving a helicopter in renaissance france
cat in matte black hat robe driving cleaning a spaceship in renaissance Mars
cat in matte black robe watering a garden in renaissance Mars
cat in matte black robe watering a garden in Arizona desert renaissance Mars
cat in matte black robe eating a cactus in Arizona desert
cat in fluffy robe eating a cactus in Arizona desert
zombie in fluffy robe eating a brain in Arizona desert
zombie in fluffy robe eating a donut in Chicago desert
zombie in fluffy bikini eating a donut in chicago street
Zombie in fluffy city eating donut in street
Zombie in city eating pizza in street
zombie in city partying in street
zombie in Rome partying in museum
Zombie in Rome studying in party
shark in Rome feasting in party
dapper shark in Rome feasting in situ
dapper koala in Sydney feasting in situ
dapper axolotl in Chinatown feasting in situ
satanic axolotl in Chinatown meditating in situ
satanic sheep in cafe meditating in situ
satanic sheep in cafe meditating in space
giant blob in cafe meditating in space
giant viking in a cafe meditating in space
scary clowns in a cafe meditating in space
hairy clowns in a car meditating in space
hairy starfish in a fishbowl meditating in space
lairy starfish in a fishbowl cogitating in space
alien starfish in a helmet cogitating in space
alien creature in a helmet cogitating in space moebius
alien monk in a kasaya cogitating in space moebius
alien monk in a kasaya cogitating in labyrinth esher
dumfounded alien child in a kasaya cogitating in labyrinth esher
alien creature in a helmet cogitating in space moebius
dumbfounded alien child in a kayak cogitating in labyrinth escher
dumbfounded alien puppet in a kayak conflagrating in labyrinth escher
dumbfounded alien puppet in a kayak conflagrating in labyrinth escher, damien hirst
disturbing alien puppet in a bubble conflagrating in labyrinth escher, damien hirst
Cute alien puppet in a bubble conflagrating in labyrinth escher, damien hirst

Tutorials

The Future of AI Movies

AI movie creation is advancing quickly like image generation did before it. It won’t be long before everyone can generate their own movies at home with finer control of the imagery produced.

Jason.

Instant Neural Graphics Primitives (NeRF)

A quick post showing some steps to get NeRF going in Visions of Chaos to help first time users.

Step 1 – Training

1. Create a new empty directory for your trained data eg D:\Nerf Test\
2. Create a directory under that called images eg D:\Nerf Test\images\
3. If you have a series of images you know will work for training, put them under images. Otherwise, you can copy the images from C:\Users\YourUserName\AppData\Roaming\Visions of Chaos\Examples\MachineLearning\Instant Neural Graphics Primitives\data\nerf\fox\images\.
4. Start Visions of Chaos and select Mode->Machine Learning->Mesh Generation->Instant Neural Graphics Primitives
5. Set the source to be D:\Nerf Test and click Train.
6. Wait for the training to finish. For the fox images on a 3090 it took around 3 minutes.

NeRF

Step 2 – Viewing

With the Source location still pointing to D:\Nerf Test you can now click View to start the viewer GUI.

If you used the fox images you will see the point cloud of the trained data like the following. Middle mouse button click and drag to slide the model around. Left click and drag to rotate.

NeRF

Step 3 – Creating a Movie

Lastly you can now create a movie of a virtual camera moving around the 3D point object.

1. Let the points accumulate enough to see a reasonable image that is not too noisy.
2. Scroll down in the settings dialog and expand Snapshot.
3. Click Save.

Now to make the camera path. By default the path dialog is hidden behind the main dialog, so click and drag the main dialog out of the way.
When you have the Camera Path dialog showing, move the camera (middle click and drag, left click and drag) to the position you want your movie to start at.

1. Click Add from cam to add that point.
2. Rotate and zoom to another location and once again click Add from cam.
3. Do this another few times to create the camera key frames.
4. Once you added all the points click Save to save the path.
5. You can now close the GUI.
6. With the Source directory still set to D:\Nerf Test click Movie.

By default it will create a 15 second movie at 30 fps at a size of 1280×720. You can change these settings if you wish.
The movie frames will be created …

NeRF

…and the movie will play when finished.

The movie is saved under your specified Scene directory.

Train your own images

See the fox images as an idea of images to use. You want a series of images rotating around the subject showing it from all sides you want to see in the final movie.
You can also use a movie to train from of your subject rotating. The movie frames will be extracted for you and then trained as normal.

Jason.

Text-to-Image Summary – Part 8

This is Part 8. There is also Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 and Part 9.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: Deforum Stable Diffusion v0.4
Author: Original script by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
Original script: https://colab.research.google.com/github/deforum/stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
Time for 512×512 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1280×640
Maximum resolution on an 8GB 2080: 640×576
Description: Incredible. Latest and greatest. Beats all previous Text-to-Image systems. If you only use one, use this one. Deforum builds upon Stable Diffusion with animation support. v0.4 is the latest version.

'a canal' Deforum Stable Diffusion v0.4
a canal

'a forest path' Deforum Stable Diffusion v0.4
a forest path

'a loft' Deforum Stable Diffusion v0.4
a loft

'a matte painting of a river hyperdetailed and CryEngine' Deforum Stable Diffusion v0.4
a matte painting of a river hyperdetailed and CryEngine

'a painting of the tropics' Deforum Stable Diffusion v0.4
a painting of the tropics

'a pastel of a nightmare 4K HD realism and trending on Flickr' Deforum Stable Diffusion v0.4
a pastel of a nightmare 4K HD realism and trending on Flickr

'a photorealistic painting of Cookie Monster rendered in unreal engine and CGSociety' Deforum Stable Diffusion v0.4
a photorealistic painting of Cookie Monster rendered in unreal engine and CGSociety

'a tropical beach by Karl Hagedorn and Michalis Oikonomou' Deforum Stable Diffusion v0.4
a tropical beach by Karl Hagedorn and Michalis Oikonomou

'an etching of King Kong' Deforum Stable Diffusion v0.4
an etching of King Kong

'concept art of Gandalf CGSociety and 4K HD realism' Deforum Stable Diffusion v0.4
concept art of Gandalf CGSociety and 4K HD realism

lovecraftian cthulhu tentacle horrors by giger and beksinski, highly textured, 8K 4K HD

roses in the rain, rosebuds, rain drops, 8K 4K HD


Name: Deforum Stable Diffusion v0.5
Author: Original script by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
Original script: https://colab.research.google.com/github/deforum/stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
Time for 512×512 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1280×640
Maximum resolution on an 8GB 2080: 640×576
Description: Incredible. Latest and greatest. Beats all previous Text-to-Image systems. If you only use one, use this one. Deforum builds upon Stable Diffusion with animation support. v0.5 is the latest version.

'a castle' Deforum Stable Diffusion v0.5
a castle

'a cute monster' Deforum Stable Diffusion v0.5
a cute monster

'a fine art painting of humans rendered in unreal engine and trending on pixiv' Deforum Stable Diffusion v0.5
a fine art painting of humans rendered in unreal engine and trending on pixiv

'a pop art painting of Frankenstein by Kim Hwan-gi and Zha Shibiao' Deforum Stable Diffusion v0.5
a pop art painting of Frankenstein by Kim Hwan-gi and Zha Shibiao

'a sorceress by Adolf Fényes and Rodolfo Morales for sale on Facebook Marketplace and trending on ArtStation' Deforum Stable Diffusion v0.5
a sorceress by Adolf Fényes and Rodolfo Morales for sale on Facebook Marketplace and trending on ArtStation

'a watercolor painting of a farm by József Breznay and John Zephaniah Bell' Deforum Stable Diffusion v0.5
a watercolor painting of a farm by József Breznay and John Zephaniah Bell

'an eagle made of feathers and silver' Deforum Stable Diffusion v0.5
an eagle made of feathers and silver

'an ugly face' Deforum Stable Diffusion v0.5
an ugly face

'puppies' Deforum Stable Diffusion v0.5
puppies

'street art of Jason Vorhees' Deforum Stable Diffusion v0.5
street art of Jason Vorhees

colorful surrealism by dali, giger, beksinski and haeckel

nebula galaxy planets hubble


Name: Deforum Stable Diffusion v0.6
Author: Original script by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
Original script: https://colab.research.google.com/github/deforum-art/deforum-stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
Time for 512×512 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1280×640
Maximum resolution on an 8GB 2080: 640×576
Description: Incredible. Latest and greatest. Beats all previous Text-to-Image systems. If you only use one, use this one. Deforum builds upon Stable Diffusion with animation support. v0.6 is the latest version.

'a bedroom' Deforum Stable Diffusion v0.6
a bedroom

'a bronze sculpture of Robert DeNiro rendered in unreal engine and trending on Flickr' Deforum Stable Diffusion v0.6
a bronze sculpture of Robert DeNiro rendered in unreal engine and trending on Flickr

'a chinese painting of a peacock by Agnes Lawrence Pelton and Bob Thompson' Deforum Stable Diffusion v0.6
a chinese painting of a peacock by Agnes Lawrence Pelton and Bob Thompson

'a cute girl 4K HD realism and 8K 3D' Deforum Stable Diffusion v0.6
a cute girl 4K HD realism and 8K 3D

'a fine art painting of a palace made of mist' Deforum Stable Diffusion v0.6
a fine art painting of a palace made of mist

'a green tree frog' Deforum Stable Diffusion v0.6
a green tree frog

'a lion' Deforum Stable Diffusion v0.6
a lion

'a storybook illustration of the Australian outback' Deforum Stable Diffusion v0.6
a storybook illustration of the Australian outback

'ballpoint pen art of Frankenstein' Deforum Stable Diffusion v0.6
ballpoint pen art of Frankenstein

'Brad Pitt by Rhea Carmi and Robert Bechtle' Deforum Stable Diffusion v0.6
Brad Pitt by Rhea Carmi and Robert Bechtle

beauty, 4K, 8K, HD, hyper detailed, high detail, surrealism

an oil painting by Picasso and van Gogh, 4K, 8K, HD, hyper detailed, high detail, surrealism


Name: Stable Diffusion v2
Author: Original script by Robin Rombach et al
Original script: https://github.com/Stability-AI/stablediffusion
Time for 768×768 on a 3090: 42 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: Uses a newly trained version of the Stable Diffusion model that renders native at 768×768. The following examples show 768×768 sized output.

'a cave' Stable Diffusion v2
a cave

'a detailed painting of fear IMAX and Flickr' Stable Diffusion v2
a detailed painting of fear IMAX and Flickr

'a digital rendering of a human made of chrome and gold' Stable Diffusion v2
a digital rendering of a human made of chrome and gold

'a mansion' Stable Diffusion v2
a mansion

'a portrait of a sad clown' Stable Diffusion v2
a portrait of a sad clown

'a spooky forest' Stable Diffusion v2
a spooky forest

'a storybook illustration of a lush rainforest for sale on Facebook Marketplace and #film' Stable Diffusion v2
a storybook illustration of a lush rainforest for sale on Facebook Marketplace and #film

'an etching of a babbling brook' Stable Diffusion v2
an etching of a babbling brook

'an oil painting of a castle in the mountains' Stable Diffusion v2
an oil painting of a castle in the mountains

'Yoda' Stable Diffusion v2
Yoda


Name: Stable Diffusion v2.1
Author: Original script by Robin Rombach et al
Original script: https://github.com/Stability-AI/stablediffusion
Time for 768×768 on a 3090: 42 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: Updated Stable Diffusion model. The following examples show 768×768 sized output.

'a detailed drawing of Frankenstein' Stable Diffusion v2.1
a detailed drawing of Frankenstein

'a forest clearing' Stable Diffusion v2.1
a forest clearing

'a frog hyperrealistic and photorealistic' Stable Diffusion v2.1
a frog hyperrealistic and photorealistic

'a mountain cabin' Stable Diffusion v2.1
a mountain cabin

'a sad clown' Stable Diffusion v2.1
a sad clown

'a surrealist sculpture of eyeballs' Stable Diffusion v2.1
a surrealist sculpture of eyeballs

'a swamp hyperdetailed and rendered in unreal engine' Stable Diffusion v2.1
a swamp hyperdetailed and rendered in unreal engine

'a townhouse photorealistic and lens flare' Stable Diffusion v2.1
a townhouse photorealistic and lens flare

'an ink drawing of Al Pacino' Stable Diffusion v2.1
an ink drawing of Al Pacino

'an ugly creature' Stable Diffusion v2.1
an ugly creature


Name: Deforum Stable Diffusion v0.7
Author: Original script by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
Original script: https://colab.research.google.com/github/deforum-art/deforum-stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
Time for 768×768 on a 3090: 2 minutes 50 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 640×576
Description: Now supports Stable Diffusion v2.1 model for 768×768 resolution.

'a cave' Deforum Stable Diffusion v0.7
a cave

'a forest clearing' Deforum Stable Diffusion v0.7
a forest clearing

'a lion' Deforum Stable Diffusion v0.7
a lion

'a pastel of Big Bird by John Blair and Christoph Ludwig Agricola CryEngine and 4K HD realism' Deforum Stable Diffusion v0.7
a pastel of Big Bird by John Blair and Christoph Ludwig Agricola CryEngine and 4K HD realism

'a tributary' Deforum Stable Diffusion v0.7
a tributary

'a watercolor painting of a western town trending on ArtStation and Tri-X 400 TX' Deforum Stable Diffusion v0.7
a watercolor painting of a western town trending on ArtStation and Tri-X 400 TX

'a werewolf' Deforum Stable Diffusion v0.7
a werewolf

'an abstract painting of Gandalf' Deforum Stable Diffusion v0.7
an abstract painting of Gandalf

'an alien forest IMAX and vivid colors' Deforum Stable Diffusion v0.7
an alien forest IMAX and vivid colors

'an engraving of a cute girl' Deforum Stable Diffusion v0.7
an engraving of a cute girl

a hyperrealistic matte painting of melting color, 4K, 8K, HD, high detail, hyper detailed

a hyperrealistic matte painting of a lush rainforest, 4K, 8K, HD, high detail, hyper detailed

a hyperrealistic matte painting of a magical glowing mushroom forest at night, 4K, 8K, HD, high detail, hyper detailed


Name: Kandinsky v2.1
Author: Original script by AI Forever
Original script: https://github.com/ai-forever/Kandinsky-2
Time for 768×768 on a 3090: 1 minute 14 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: A new alternative script to Stable Diffusion and other models. Definitely worth a try.

'a cabin' Kandinsky v2.1
a cabin

'a fireman' Kandinsky v2.1
a fireman

'a hyperrealistic painting of an ocean by Ella Guru and Walter Emerson Baum rendered in unreal engine and photorealistic' Kandinsky v2.1
a hyperrealistic painting of an ocean by Ella Guru and Walter Emerson Baum rendered in unreal engine and photorealistic

'a lineart illustration of goldfish 4K photo and vivid colors' Kandinsky v2.1
a lineart illustration of goldfish 4K photo and vivid colors

'a portrait of a beautiful young girl in a garden at dusk' Kandinsky v2.1
a portrait of a beautiful young girl in a garden at dusk

'a robot' Kandinsky v2.1
a robot

'an impressionist painting of a happy family' Kandinsky v2.1
an impressionist painting of a happy family

'an ugly monster' Kandinsky v2.1
an ugly monster

'Harry Potter' Kandinsky v2.1
Harry Potter

'Spiderman' Kandinsky v2.1
Spiderman


Name: DeepFloyd IF
Author: Original script by DeepFloyd AI Research Band
Original script: https://github.com/deep-floyd/IF
Time for 1024×1024 on a 3090: 1 minute 17 seconds
Maximum resolution on a 24 GB 3090: 1024×1024 only
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: A new alternative script to Stable Diffusion and other models. 1024×1024 native resolution is nice.
Click to see these samples in 1024×1024 resolution.


a babbling brook


a cathedral


a collage painting of a vast city lens flare and 8K 3D


a cove


a mountain cabin


a still life of a mountain path


a teddy bear


a worried man made of bones and wire


an allegory of Charmander


gorillas


Name: Kandinsky v2.2
Author: Original script by AI Forever
Original script: https://github.com/ai-forever/Kandinsky-2
Time for 1024×1024 on a 3090: 1 minute 3 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: An update to Kandinsky. Handles 1024×1024 resolution. Superb fast results.


a digital rendering of frogs 8K 3D and CryEngine


a gouache of a happy person made of liquid metal and metal by Emma Lampert Cooper and Zha Shibiao


a lounge room


a lush rainforest 8K 3D and for sale on Facebook Marketplace


a watercolor painting of an ugly person made of chrome and chrome


an attractive woman


an ugly face


computer graphics of fear ZBrush and filmic


impressionist of New York City trending on ArtStation and trending on Flickr


pixel art of a robot by Siona Shimshi and Nedroid


Name: SDXL 1.0
Author: Original script by Stability AI
Original script: https://github.com/Stability-AI/generative-models/tree/main
Time for 1024×1024 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: The latest from Stability AI. Handles 1024×1024 resolutions by default.


a detailed painting of a Pixar character


a hyperrealistic painting of a rough seascape


a kitten wearing pajamas and sunglasses in times square


A mystical forest filled with glowing mushrooms and iridescent butterflies, where a wise old owl perched on a branch watches over a group of playful fairies as they dance under the moonlight.


a new york city street in the rain


a painting of the amazon rainforest


a portrait of a beautiful young girl in a garden at dusk


a portrait of a female cyborg by h r giger


an oil painting of an ugly creature


Lovecraftian horror


Name: PixArt-alpha
Author: Original script by PixArt-alpha
Original script: https://github.com/PixArt-alpha/PixArt-alpha
Time for 1024×1024 on a 4090: 14 seconds
Description: Another fast Text-to-Image model/script. Handles 1024×1024 resolutions by default.


a cartoon of a monkey


a cove


a cute creature made of vines and mist


a fine art painting of a colorful parrot


a kangaroo


a photorealistic painting of a clown


etching of an ugly man


impressionist of a lighthouse


scribble art of an ugly person


the country rendered in Cinema4D and hyperrealistic


Name: Playground v2
Author: Original script by playgroundai
Original script: https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic
Time for 1024×1024 on a 4090: 15 seconds
Description: Stable diffusion alternate model trained from scratch. Handles 1024×1024 resolutions by default.


a cubist painting of an ugly man


a monkey made of liquid metal and feathers lens flare and psychedelic


a portrait of a young girl


a surrealist sculpture of a green tree frog


a witch made of vines and paper


an acrylic painting of a cloudy sunset


an anime drawing of a sad clown


fear


screen printing of a canyon


trypophobia


Name: Stable Cascade
Author: Original script by Stability AI
Original script: https://github.com/Stability-AI/StableCascade
Time for 1024×1024 on a 4090: 15 seconds
Description: Handles 1024×1024 resolutions by default. Fast and high quality outputs. Also handles widescreen high resolutions with minimal subject duplication.


a church 4K HD realism and trending on pixiv


a detailed matte painting of reflective spheres vivid colors and 8K 3D


a mosaic of a sorcerer by Isabel Codrington and John Harris


a photo of a cute girl


an ugly person


an ugly person by Ford Madox Brown and Al Feldstein


computer graphics of a cute creature 4K photo and trending on Flickr


digital sculpture of Big Bird


Harry Potter


Robocop


Name: PixArt-sigma
Author: Original script by PixArt-alpha
Original script: https://github.com/PixArt-alpha/PixArt-sigma
Time for 1024×1024 on a 4090: 14 seconds
Description: Updated from PixArt-alpha.


a caricature of love


a cliff


a cute girl by Ernesto Neto and Carel Weight rendered in Cinema4D and super detailed


a heart psychedelic and Flickr


a marble sculpture of Lovecraftian horror trending on ArtStation and trending on pixiv


a nightmare


a pond


an ultrafine detailed painting of an ocean by Ejnar Nielsen and Kim Deuk-sin


eyeballs made of mist and bones


watercolor of eyeballs


Stable Diffusion 3 Medium
Author: Original script by Stabiliy AI
Original script: https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers
Time for 1024×1024 on a 4090: 15 seconds
Description: New model from Stability AI. They only released the medium size model at this stage, so results are not the best. The following samples are cherry picked best results from a many random prompt batch run.


a cinematic painting of a waterfall


a demon


a detailed matte painting of anxiety


a hyperrealistic painting of a swamp


a mountain range #film and super detailed


a sea monster


a tiger


an octopus hyperdetailed and lens flare


Hermoine Granger


scratchboard art of a crying person Flickr and 8K 3D


Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

Text-to-Image Summary – Part 7

This is Part 7. There is also Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 8 and Part 9.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: Multi-Perceptor VQGAN+CLIP v4
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 2 minutes 36 seconds
Maximum resolution on a 24 GB 3090: 1120×480
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Version 4 of Remi’s Multi-Perceptor VQGAN+CLIP script.

'a bronze sculpture of a garden' Multi-Perceptor VQGAN+CLIP v4
a bronze sculpture of a garden

'a church by Tadeusz Kantor' Multi-Perceptor VQGAN+CLIP v4
a church by Tadeusz Kantor

'a color pencil sketch of a monkey hyperdetailed' Multi-Perceptor VQGAN+CLIP v4
a color pencil sketch of a monkey hyperdetailed

'a comic book panel of a lush rainforest' Multi-Perceptor VQGAN+CLIP v4
a comic book panel of a lush rainforest

'a matte painting of a witch by William Geissler' Multi-Perceptor VQGAN+CLIP v4
a matte painting of a witch by William Geissler

'a peninsula by Ei-Q CGSociety' Multi-Perceptor VQGAN+CLIP v4
a peninsula by Ei-Q CGSociety

'a surrealist sculpture of hell' Multi-Perceptor VQGAN+CLIP v4
a surrealist sculpture of hell

'an eyeball made of flowers' Multi-Perceptor VQGAN+CLIP v4
an eyeball made of flowers

'cyberpunk art of a canyon' Multi-Perceptor VQGAN+CLIP v4
cyberpunk art of a canyon

'lineart of dense woodland' Multi-Perceptor VQGAN+CLIP v4
lineart of dense woodland


Name: V-Majesty Diffusion v1.2
Authors: Original script by Dango233 and multimodalart
Original script: https://colab.research.google.com/github/multimodalart/MajestyDiffusion/blob/main/v.ipynb
Time for 512×512 on a 3090: 3 minutes 08 seconds
Maximum resolution on a 24 GB 3090: 1664×704.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: A new diffusion based script.

'a black and white photo of war' V-Majesty Diffusion
a black and white photo of war

'a brownstone by Oliver Sin super detailed' V-Majesty Diffusion
a brownstone by Oliver Sin super detailed

'a cute creature' V-Majesty Diffusion
a cute creature

'a doctor' V-Majesty Diffusion
a doctor

'a drawing of a babbling brook photorealistic' V-Majesty Diffusion
a drawing of a babbling brook photorealistic

'a hill' V-Majesty Diffusion
a hill

'a ninja by Michael Ford psychedelic' V-Majesty Diffusion
a ninja by Michael Ford psychedelic

'a photo of a beautiful young girl in a summer garden at dusk' V-Majesty Diffusion
a photo of a beautiful young girl in a summer garden at dusk

'a storybook illustration of a cozy den' V-Majesty Diffusion
a storybook illustration of a cozy den

'New York City' V-Majesty Diffusion
New York City


Name: Latent Majesty Diffusion v1.3
Authors: Original script by Dango233 and multimodalart
Original script: https://colab.research.google.com/github/multimodalart/MajestyDiffusion/blob/main/latent.ipynb
Time for 512×512 on a 3090: 2 minutes 24 seconds
Maximum resolution on a 24 GB 3090: 512×512 (when using GFPGAN upscaling)
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Starts with a smaller resolution image (usually 256×256 pixels), upscales it with GFPGAN, and then does a few more diffusion passes. GFPGAN can really help get better coherency in faces.

'a hyperrealistic painting of a cute creature' Latent Majesty Diffusion
a hyperrealistic painting of a cute creature

'a hyperrealistic painting of an evil clown' Latent Majesty Diffusion
a hyperrealistic painting of an evil clown

'a picture of a tree' Latent Majesty Diffusion
a picture of a tree

'a surrealist painting of kittens' Latent Majesty Diffusion
a surrealist painting of kittens

'an engraving of an angry woman made of voxels' Latent Majesty Diffusion
an engraving of an angry woman made of voxels

'an oil painting of an attractive woman by Eileen Aldridge' Latent Majesty Diffusion
an oil painting of an attractive woman by Eileen Aldridge

'an ultrafine detailed painting of Bruce Willis 4K HD realism' Latent Majesty Diffusion
an ultrafine detailed painting of Bruce Willis 4K HD realism

'Robert DeNiro ZBrush' Latent Majesty Diffusion
Robert DeNiro ZBrush

'Tweety Pie' Latent Majesty Diffusion
Tweety Pie

'Yoda' Latent Majesty Diffusion
Yoda


Name: Huemin JAX Diffusion v2.7
Author: Huemin
Original script: https://colab.research.google.com/github/huemin-art/jax-guided-diffusion/blob/v2.7/Huemin_Jax_Diffusion_2_7.ipynb
Time for 512×512 on a 3090: 3 minutes 55 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Starts with a smaller resolution image (usually 256×256 pixels), upscales it with GFPGAN, and then does a few more diffusion passes. GFPGAN can really help get better coherency in faces.

'a babbling brook' Huemin JAX Diffusion
a babbling brook

'a mountain path Flickr' Huemin JAX Diffusion
a mountain path Flickr

'a river CGSociety' Huemin JAX Diffusion
a river CGSociety

'a spooky forest by John F. Peto' Huemin JAX Diffusion
a spooky forest by John F. Peto

'a storybook illustration of a mansion by Donald Roller Wilson' Huemin JAX Diffusion
a storybook illustration of a mansion by Donald Roller Wilson

'a surrealist sculpture of an alien forest' Huemin JAX Diffusion
a surrealist sculpture of an alien forest

'a watercolor painting of fear made of bones ZBrush' Huemin JAX Diffusion
a watercolor painting of fear made of bones ZBrush

'a wetland by Alexander Robertson super detailed' Huemin JAX Diffusion
a wetland by Alexander Robertson super detailed

'New York City' Huemin JAX Diffusion
New York City

'vector art of a bouquet of flowers ZBrush' Huemin JAX Diffusion
vector art of a bouquet of flowers ZBrush


Name: Disco Diffusion v5.2
Authors: Original script by @somnai, @gandamu and @zippy731
Original script: https://colab.research.google.com/github/zippy731/disco-diffusion-turbo/blob/skunk/Disco_Diffusion.ipynb
Time for 512×512 on a 3090: 2 minutes 2 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 768×768
Description: Latest version of Disco Diffusion.

'a bay by Louis Valtat Tri-X 400 TX' Disco Diffusion v5.2
a bay by Louis Valtat Tri-X 400 TX

'a hyperrealistic painting of a mansion' Disco Diffusion v5.2
a hyperrealistic painting of a mansion

'a hyperrealistic painting of a nightmare' Disco Diffusion v5.2
a hyperrealistic painting of a nightmare

'a pond' Disco Diffusion v5.2
a pond

'a tributary by Jacob Marrel' Disco Diffusion v5.2
a tributary by Jacob Marrel

'an alien landscape' Disco Diffusion v5.2
an alien landscape

'an island by Andrew Robertson IMAX' Disco Diffusion v5.2
an island by Andrew Robertson IMAX

'Frankenstein made of liquid metal CryEngine' Disco Diffusion v5.2
Frankenstein made of liquid metal CryEngine

'medusa by Gai Qi' Disco Diffusion v5.2
medusa by Gai Qi

'the Amazon Rainforest' Disco Diffusion v5.2
the Amazon Rainforest


Name: DALL-E Mini
Author: Original script by Boris Dayma
Original script: https://colab.research.google.com/github/borisdayma/dalle-mini/blob/main/tools/inference/inference_pipeline.ipynb
Time for 512×512 on a 3090: Locked to 256×256 – 1 minute 13 seconds
Maximum resolution on a 24 GB 3090: 256×256
Maximum resolution on an 8GB 2080: 256×256
Description: Capable of rendering multiple images in one pass. Very nice results. Limited to 256×256 at this time. These examples show a 4×4 grid of 16 images for each prompt.

a fine art painting of a fire breathing dragona fine art painting of a fire breathing dragon

a hyperrealistic painting of an ugly monstera hyperrealistic painting of an ugly monster

a planeta planet

a rivera river

a rosea rose

a surrealist painting of a kinga surrealist painting of a king

an airbrush painting of satanan airbrush painting of satan

an engraving of a frogan engraving of a frog

an ultrafine detailed painting of fearan ultrafine detailed painting of fear

Darth Vader trending on pixivDarth Vader trending on pixiv


Name: Latent Majesty Diffusion v1.6
Authors: Original script by Dango233 and multimodalart
Original script: https://colab.research.google.com/github/multimodalart/MajestyDiffusion/blob/main/latent.ipynb
Time for 512×512 on a 3090: 2 minutes 07 seconds
Maximum resolution on a 24 GB 3090: 512×512 (when using GFPGAN upscaling)
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: The latest amazing update to Latent Diffusion. Awesome colors, textures, lighting, details, coherency. Highly recommended.

'a colorful parrot' Latent Majesty Diffusion v1.6
a colorful parrot

'a detailed matte painting of puppies' Latent Majesty Diffusion v1.6
a detailed matte painting of puppies

'a gallery' Latent Majesty Diffusion v1.6
a gallery

'a mountain cabin by Tom Palin 4K HD realism' Latent Majesty Diffusion v1.6
a mountain cabin by Tom Palin 4K HD realism

'a renaissance painting of a spooky forest' Latent Majesty Diffusion v1.6
a renaissance painting of a spooky forest

'a school of tropical fish by Jane Carpanini' Latent Majesty Diffusion v1.6
a school of tropical fish by Jane Carpanini

'an ugly creature' Latent Majesty Diffusion v1.6
an ugly creature

'the Amazon Rainforest photorealistic' Latent Majesty Diffusion v1.6
the Amazon Rainforest photorealistic

'The Grinch' Latent Majesty Diffusion v1.6
The Grinch

'Yoda trending on Flickr' Latent Majesty Diffusion v1.6
Yoda trending on Flickr


Name: Disco Diffusion v5.4
Authors: Original script by @somnai, @gandamu, @zippy731 and @devdef
Original script: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb
Time for 512×512 on a 3090: 2 minutes 20 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 768×768
Description: Latest version of Disco Diffusion.

'a 3D render of the Grand Canyon by Dóra Keresztes' Disco Diffusion v5.4
a 3D render of the Grand Canyon by Dóra Keresztes

'a hyperrealistic painting of a zombie made of cheese and feathers CryEngine and rendered in Cinema4D' Disco Diffusion v5.4
a hyperrealistic painting of a zombie made of cheese and feathers CryEngine and rendered in Cinema4D

'a macro photograph of an ugly creature' Disco Diffusion v5.4
a macro photograph of an ugly creature

'a matte painting of a monument' Disco Diffusion v5.4
a matte painting of a monument

'a picture of a vast city' Disco Diffusion v5.4
a picture of a vast city

'a skyscraper' Disco Diffusion v5.4
a skyscraper

'a thunder storm' Disco Diffusion v5.4
a thunder storm

'Jason Vorhees by Chen Chi' Disco Diffusion v5.4
Jason Vorhees by Chen Chi

'reflective spheres hyperrealistic' Disco Diffusion v5.4
reflective spheres hyperrealistic

'the country by Robert Thomas and Chen Jiru rendered in unreal engine and 4K photo' Disco Diffusion v5.4
the country by Robert Thomas and Chen Jiru rendered in unreal engine and 4K photo


Name: Pixel Art Diffusion v3
Authors: Original script by @somnai, @gandamu, @zippy731 and @KaliYuga_ai
Original script: https://colab.research.google.com/github/KaliYuga-ai/Pixel-Art-Diffusion/blob/main/Pixel_Art_Diffusion_v3_0_(With_Disco_Symmetry).ipynb
Time for 512×512 on a 3090: 3 minutes 14 seconds
Maximum resolution on a 24 GB 3090: 1920×1088
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Uses a fine tuned model to generate pixel art like imagery.

'a bedroom hyperrealistic and hyperdetailed #pixelart' Pixel Art Diffusion v3
a bedroom hyperrealistic and hyperdetailed #pixelart

'a cephalopod IMAX and lens flare #pixelart' Pixel Art Diffusion v3
a cephalopod IMAX and lens flare #pixelart

'a cinematic painting of a cottage #pixelart' Pixel Art Diffusion v3
a cinematic painting of a cottage #pixelart

'a cross stitch of a townhouse made of voxels and timber #pixelart' Pixel Art Diffusion v3
a cross stitch of a townhouse made of voxels and timber #pixelart

'a pastel of a cute girl #pixelart' Pixel Art Diffusion v3
a pastel of a cute girl #pixelart

'a surrealist sculpture of Charmander #pixelart' Pixel Art Diffusion v3
a surrealist sculpture of Charmander #pixelart

'an alien city by Anders Zorn and Laura Muntz Lyall photorealistic and CGSociety #pixelart' Pixel Art Diffusion v3
an alien city by Anders Zorn and Laura Muntz Lyall photorealistic and CGSociety #pixelart


concept art of a wetland ZBrush and CryEngine #pixelart

'Frankenstein made of string and vines trending on pixiv and hyperrealistic #pixelart' Pixel Art Diffusion v3
Frankenstein made of string and vines trending on pixiv and hyperrealistic #pixelart

'pixel art of a cloudy sunset #pixelart' Pixel Art Diffusion v3
pixel art of a cloudy sunset #pixelart


Name: Disco Diffusion v5.6
Authors: Original script by @somnai and @gandamu
Original script: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb
Time for 512×512 on a 3090: 3 minutes 34 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 768×768
Description: Latest version of Disco Diffusion.

'a collage painting of a lush rainforest by Doc Hammer and Alexander Ivanov hyperrealistic and CryEngine' Disco Diffusion v5.4
a collage painting of a lush rainforest by Doc Hammer and Alexander Ivanov hyperrealistic and CryEngine

'a cubist painting of a lion and a sunset CryEngine and trending on pixiv' Disco Diffusion v5.4
a cubist painting of a lion and a sunset CryEngine and trending on pixiv

'a fine art painting of a zombie' Disco Diffusion v5.4
a fine art painting of a zombie

'a gulf by I Ketut Soki and Alfons von Czibulka' Disco Diffusion v5.4
a gulf by I Ketut Soki and Alfons von Czibulka

'a monastery trending on Flickr and #film' Disco Diffusion v5.4
a monastery trending on Flickr and #film

'a morning landscape' Disco Diffusion v5.4
a morning landscape

'a prairie CGSociety and CryEngine' Disco Diffusion v5.4
a prairie CGSociety and CryEngine

'a werewolf' Disco Diffusion v5.4
a werewolf

'ballpoint pen art of a monument' Disco Diffusion v5.4
ballpoint pen art of a monument


cyberpunk art of heaven filmic and CryEngine


Name: CLIP Guided k-diffusion
Author: Original script by Katherine Crowson
Original script: https://colab.research.google.com/drive/1w0HQqxOKCk37orHATPxV8qb0wb4v-qa0
Time for 512×512 on a 3090: 6 minutes 56 seconds
Maximum resolution on a 24 GB 3090: Fixed to 512×512 resolution.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: A new script by Katherine. Seems to generate more abstract results and these example images needed a long run of random prompts to select from.

'a jigsaw puzzle of paranoia by Petr Brandl and Sasha Putrya' CLIP Guided k-diffusion v5.4
a jigsaw puzzle of paranoia by Petr Brandl and Sasha Putrya

'a landscape vivid colors' CLIP Guided k-diffusion v5.4
a landscape vivid colors

'a pastel of Cookie Monster by Ren Bonian and Ángel Botello for sale on Facebook Marketplace and CryEngine' CLIP Guided k-diffusion v5.4
a pastel of Cookie Monster by Ren Bonian and Ángel Botello for sale on Facebook Marketplace and CryEngine

'a reef' CLIP Guided k-diffusion v5.4
a reef

'a renaissance painting of Al Pacino' CLIP Guided k-diffusion v5.4
a renaissance painting of Al Pacino

'a statue of a submarine made of metal and crystals by James Sessions American painter and Elfriede Lohse-Wächtler' CLIP Guided k-diffusion v5.4
a statue of a submarine made of metal and crystals by James Sessions American painter and Elfriede Lohse-Wächtler

'an airbrush painting of a nightmare creature vivid colors and rendered in Cinema4D' CLIP Guided k-diffusion v5.4
an airbrush painting of a nightmare creature vivid colors and rendered in Cinema4D

'an oil painting of a cephalopod made of paper and mist' CLIP Guided k-diffusion v5.4
an oil painting of a cephalopod made of paper and mist

'an ugly person and an area 4K HD realism and trending on pixiv' CLIP Guided k-diffusion v5.4
an ugly person and an area 4K HD realism and trending on pixiv

'conceptual art of an ugly monster' CLIP Guided k-diffusion v5.4
conceptual art of an ugly monster


Name: CLIP Prior + VQGAN (MSE method)
Author: Original script by Katherine Crowson
Original script: https://colab.research.google.com/drive/1yOpCY9eXvzELHppvh-o0DevhxVYOGr5i
Time for 512×512 on a 3090: 3 minutes 31 seconds
Maximum resolution on a 24 GB 3090: 832×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: A new script by Katherine. Can give some interesting details but coherence may suffer at larger resolutions.

'a collage painting of a tiger vivid colors and photorealistic' CLIP Prior + VQGAN (MSE method)
a collage painting of a tiger vivid colors and photorealistic

'a cove 4K photo and CryEngine' CLIP Prior + VQGAN (MSE method)
a cove 4K photo and CryEngine

'a cute creature' CLIP Prior + VQGAN (MSE method)
a cute creature

'a glacier' CLIP Prior + VQGAN (MSE method)
a glacier

'a space nebula' CLIP Prior + VQGAN (MSE method)
a space nebula

'a townhouse' CLIP Prior + VQGAN (MSE method)
a townhouse

'a valley' CLIP Prior + VQGAN (MSE method)
a valley

'an oil painting of a peacock by Wu Hong and Eve Ryder' CLIP Prior + VQGAN (MSE method)
an oil painting of a peacock by Wu Hong and Eve Ryder

'Cthulhu' CLIP Prior + VQGAN (MSE method)
Cthulhu

'digital art of a wetland made of cheese and timber by Jacob Duck and Jacob Gerritsz Cuyp' CLIP Prior + VQGAN (MSE method)
digital art of a wetland made of cheese and timber by Jacob Duck and Jacob Gerritsz Cuyp


Name: Latent Diffusion LAION_400M v2
Author: Original script by pesser
Original script: https://github.com/pesser/stable-diffusion
Time for 16 256×256 images on a 3090: 49 seconds
Maximum resolution on a 24 GB 3090: 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Renders multiple images quickly. Coherency is best at 256×256 so these example images are 2×2 tiled results. Each took 35 seconds on a 3090.

'a babbling brook' Latent Diffusion LAION_400M v2
a babbling brook

'a colorful parrot' Latent Diffusion LAION_400M v2
a colorful parrot

'a fine art painting of a castle' Latent Diffusion LAION_400M v2
a fine art painting of a castle

'a matte painting of a rose' Latent Diffusion LAION_400M v2
a matte painting of a rose

'a pencil sketch of a cave 4K photo and hyperrealistic' Latent Diffusion LAION_400M v2
a pencil sketch of a cave 4K photo and hyperrealistic

'a photorealistic painting of Cthulhu for sale on Facebook Marketplace and Flickr' Latent Diffusion LAION_400M v2
a photorealistic painting of Cthulhu for sale on Facebook Marketplace and Flickr

'a surrealist painting of a cloudy sunset' Latent Diffusion LAION_400M v2
a surrealist painting of a cloudy sunset

'a surrealist painting of a monkey' Latent Diffusion LAION_400M v2
a surrealist painting of a monkey

'an illustration of of a tiger by Stanley Twardowicz and Antoni Pitxot' Latent Diffusion LAION_400M v2
an illustration of of a tiger by Stanley Twardowicz and Antoni Pitxot

'an impressionist painting of a cottage' Latent Diffusion LAION_400M v2
an impressionist painting of a cottage


Name: Stable Diffusion
Author: Original script by pesser
Original script: https://github.com/CompVis/stable-diffusion
Time for 512×512 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1280×640
Maximum resolution on an 8GB 2080: 640×576
Description: Incredible. Latest and greatest. Beats all previous Text-to-Image systems. If you only use one, use this one.

'a black and white photo of puppies' Stable Diffusion
a black and white photo of puppies

'a cathedral rendered in unreal engine and super detailed' Stable Diffusion
a cathedral rendered in unreal engine and super detailed

'a city made of mist trending on ArtStation and trending on Flickr' Stable Diffusion
a city made of mist trending on ArtStation and trending on Flickr

'a detailed matte painting of a lush rainforest made of crystals and feathers' Stable Diffusion
a detailed matte painting of a lush rainforest made of crystals and feathers

'a king' Stable Diffusion
a king

'a polaroid photo of a clown vivid colors and 8K 3D' Stable Diffusion
a polaroid photo of a clown vivid colors and 8K 3D

'an airbrush painting of the Terminator CryEngine and for sale on Facebook Marketplace' Stable Diffusion
an airbrush painting of the Terminator CryEngine and for sale on Facebook Marketplace

'an ambient occlusion render of a wetland by William Forsyth and Victorine Foot trending on pixiv and CryEngine' Stable Diffusion
an ambient occlusion render of a wetland by William Forsyth and Victorine Foot trending on pixiv and CryEngine

'poster art of a farm by Frederic Leighton and Yang Borun rendered in unreal engine and 8K 3D' Stable Diffusion
poster art of a farm by Frederic Leighton and Yang Borun rendered in unreal engine and 8K 3D

'the Australian outback' Stable Diffusion
the Australian outback


Name: Deforum Stable Diffusion v0.3
Author: Original script by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
Original script: https://colab.research.google.com/github/deforum/stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
Time for 512×512 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1280×640
Maximum resolution on an 8GB 2080: 640×576
Description: Incredible. Latest and greatest. Beats all previous Text-to-Image systems. If you only use one, use this one. Deforum builds upon Stable Diffusion with animation support.

'a babbling brook' Deforum Stable Diffusion
a babbling brook

'a forest path' Deforum Stable Diffusion
a forest path

'a photo of a lake' Deforum Stable Diffusion
a photo of a lake

'a ranch by Nikolai Alekseyevich Kasatkin and Harriet Zeitlin' Deforum Stable Diffusion
a ranch by Nikolai Alekseyevich Kasatkin and Harriet Zeitlin

'a watercolor painting of a rectory hyperdetailed and trending on ArtStation' Deforum Stable Diffusion
a watercolor painting of a rectory hyperdetailed and trending on ArtStation

'Al Pacino by Lam Qua and George Frederick Harris' Deforum Stable Diffusion
Al Pacino by Lam Qua and George Frederick Harris

'an angry person by Kazys Varnelis and Dóra Keresztes' Deforum Stable Diffusion
an angry person by Kazys Varnelis and Dóra Keresztes

'an astronaut' Deforum Stable Diffusion
an astronaut

'puppies' Deforum Stable Diffusion
puppies

'war' Deforum Stable Diffusion
war

giger xenomorphs, airbrush, HD, 4K, 8K, hyperrealistic, highly detailed, highly textured


Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

Text-to-Image Summary – Part 6

This is Part 6. There is also Part 1, Part 2, Part 3, Part 4, Part 5, Part 7, Part 8 and Part 9.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: Augmented CLIP Guided Diffusion
Author: Peter Baylies
Original script: https://github.com/pbaylies/Augmented_CLIP
Time for 512×512 on a 3090: 1 minutes 16 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: 256×256 57 seconds
Description: Another CLIP Guided Diffusion script. Fast. Gives unique textured results.

'a detailed painting of people by Nicolette Macnamara' Augmented CLIP Guided Diffusion
a detailed painting of people by Nicolette Macnamara

'a diagram of a nightmare creature made of gold' Augmented CLIP Guided Diffusion
a diagram of a nightmare creature made of gold

'a nightmare creature' Augmented CLIP Guided Diffusion
a nightmare creature

'a painting of a cabin next to a stream in a secluded forest' Augmented CLIP Guided Diffusion
a painting of a cabin next to a stream in a secluded forest

'a storybook illustration of Jabba The Hutt by Carle Hessay' Augmented CLIP Guided Diffusion
a storybook illustration of Jabba The Hutt by Carle Hessay

'a werewolf by A R Middleton Todd' Augmented CLIP Guided Diffusion
a werewolf by A R Middleton Todd

'an oil painting of Big Bird' Augmented CLIP Guided Diffusion
an oil painting of Big Bird

'Gandalf trending on pixiv' Augmented CLIP Guided Diffusion
Gandalf trending on pixiv

'Lovecraftian horror' Augmented CLIP Guided Diffusion
Lovecraftian horror

'Lovecraftian horror' Augmented CLIP Guided Diffusion
poster art of the Las Vegas strip by George Passantino


Name: Princess Generator
Author: Dango233
Original script: https://colab.research.google.com/drive/1QgH9TvQMXR3PpEGBcHnghtEcwFDXLaYE
Time for 512×512 on a 3090: 2 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: The latest update to “CLIP Guided Diffusion v6” from Dango233. Can give some superb results. Worth exploring and experimenting with further.

'a cloudy sunset' Princess Generator
a cloudy sunset

'a fireplace by Jacob More' Princess Generator
a fireplace by Jacob More

'a happy alien by James Jarvaise' Princess Generator
a happy alien by James Jarvaise

'a mountain path by Stephen Pace' Princess Generator
a mountain path by Stephen Pace

'a raytraced image of a western town' Princess Generator
a raytraced image of a western town

'a teddy bear' Princess Generator
a teddy bear

'Charmander made of wood by Hua Yan' Princess Generator
Charmander made of wood by Hua Yan

'dense woodland by Marie Angel' Princess Generator
dense woodland by Marie Angel

'paranoia by Floris van Dyck' Princess Generator
paranoia by Floris van Dyck

'portrait of Princess Victoria trending on artstation' Princess Generator
portrait of Princess Victoria trending on artstation


Name: Disco Diffusion v4.1
Author: @Somnai
Original script: https://colab.research.google.com/drive/1sHfRn5Y0YKYKi1k-ifUSBFRNJ8_1sa39
Time for 512×512 on a 3090: 1 minute 57 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 1152×512. 4 minutes 39 seconds.
Description: The latest update to Disco Diffusion. Really nice detailed outputs. Low VRAM requirments allow huge sized images. I didn’t realise I had 3 zombie themed results in this random batch.

'a bronze sculpture of a zombie' Disco Diffusion v4.1
a bronze sculpture of a zombie

'a fantasy land' Disco Diffusion v4.1
a fantasy land

'a pencil sketch of Cthulhu by Rudolf Koller' Disco Diffusion v4.1
a pencil sketch of Cthulhu by Rudolf Koller

'a pop art painting of zombies' Disco Diffusion v4.1
a pop art painting of zombies

'a portrait of a young boy by Hendrick Cornelisz. van Vliet' Disco Diffusion v4.1
a portrait of a young boy by Hendrick Cornelisz. van Vliet

'a tree by Philips Wouwerman' Disco Diffusion v4.1
a tree by Philips Wouwerman

'a western town' Disco Diffusion v4.1
a western town

'a zombie' Disco Diffusion v4.1
a zombie

'Han Solo psychedelic' Disco Diffusion v4.1
Han Solo psychedelic

'vector art of the Amazon Rainforest' Disco Diffusion v4.1
vector art of the Amazon Rainforest


Name: Hypertron v2
Author: Philipuss
Original script: https://colab.research.google.com/drive/10fa8X6EsfZfda1dfhJ_BtfPZ7Te1WGoX
Time for 512×512 on a 3090: 1 minute 57 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 18 seconds
Description: Version 2 of Hypertron. More models, more flavors. Works OK. Can give the “image in a sea of purple/grey” that previous MSE based scripts suffered from. Can give good results if you let it run a large random batch overnight.

'a bronze sculpture of a spooky forest by Herb Aach' Hypertron v2
a bronze sculpture of a spooky forest by Herb Aach

'a diamond made of flowers' Hypertron v2
a diamond made of flowers

'a gouache of an android by Wu Bin' Hypertron v2
a gouache of an android by Wu Bin

'a photo of a kitchen' Hypertron v2
a photo of a kitchen

'a photorealistic painting of a cemetery' Hypertron v2
a photorealistic painting of a cemetery

'a sketch of a haunted house' Hypertron v2
a sketch of a haunted house

'a tattoo of Squirtle made of clay' Hypertron v2
a tattoo of Squirtle made of clay

'an art deco painting of a human by Nicolas Lancret 8K 3D' Hypertron v2
an art deco painting of a human by Nicolas Lancret 8K 3D

'goldfish by Elfriede Lohse-Wächtler' Hypertron v2
goldfish by Elfriede Lohse-Wächtler

'Lovecraftian horror by Aileen Eagleton' Hypertron v2
Lovecraftian horror by Aileen Eagleton


Name: CC12M Diffusion
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1TBo4saFn1BCSfgXsmREFrUl3zSQFg6CC
Time for 512×512 on a 3090: 1 minute 48 seconds
Maximum resolution on a 24 GB 3090: 1664×704.
Maximum resolution on an 8GB 2080: 832×512 2 minutes 59 seconds
Description: Can support higher resolutions, but the coherance really falls apart with anything over 256×256. It handles multiple images at once, so these examples are 4 256×256 results.

'a beachside resort' CC12M Diffusion
a beachside resort

'a bouquet of flowers' CC12M Diffusion
a bouquet of flowers

'a castle' CC12M Diffusion
a castle

'a cemetery' CC12M Diffusion
a cemetery

'a cephalopod by Walter Stuempfig super detailed' CC12M Diffusion
a cephalopod by Walter Stuempfig super detailed

'a color pencil sketch of a bedroom super detailed' CC12M Diffusion
a color pencil sketch of a bedroom super detailed

'a kitchen' CC12M Diffusion
a kitchen

'a mountainscape' CC12M Diffusion
a mountainscape

'a nightclub' CC12M Diffusion
a nightclub

'a vast city' CC12M Diffusion
a vast city


Name: Disco Diffusion v5
Authors: @Somnai and @Gandamu
Original script: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb
Time for 512×512 on a 3090: 2 minutes 02 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 1152×512. 4 minutes 43 seconds.
Description: The latest update to Disco Diffusion.

'a cloudy sunset' Disco Diffusion v5
a cloudy sunset

'a crying person made of wrought iron by František Jakub Prokyš psychedelic' Disco Diffusion v5
a crying person made of wrought iron by František Jakub Prokyš psychedelic

'a flemish baroque of a school of tropical fish' Disco Diffusion v5
a flemish baroque of a school of tropical fish

'a low poly render of puppies' Disco Diffusion v5
a low poly render of puppies

'a morning landscape' Disco Diffusion v5
a morning landscape

'a mosaic of a worried man by Paul Lohse' Disco Diffusion v5
a mosaic of a worried man by Paul Lohse

'a thunder storm by Cornelis Claesz van Wieringen' Disco Diffusion v5
a thunder storm by Cornelis Claesz van Wieringen

'a tropical beach' Disco Diffusion v5
a tropical beach

'computer rendering of an evil alien 4K HD realism' Disco Diffusion v5
computer rendering of an evil alien 4K HD realism

'the human condition Flickr' Disco Diffusion v5
the human condition Flickr


Name: Disco Diffusion v5 Turbo Smooth
Authors: Chris Allen
Original script: https://colab.research.google.com/github/zippy731/disco-diffusion-turbo/blob/main/Disco_Diffusion_v5_Turbo_%5Bw_3D_animation%5D.ipynb
Time for 512×512 on a 3090: 1 minutes 14 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 832×384. 2 minutes 21 seconds.
Description: An updated version of Disco Diffusion v5 that gives fast and smooth movie outputs.

'a black and white photo of a lush rainforest trending on Flickr' Disco Diffusion v5 Turbo Smooth
a black and white photo of a lush rainforest trending on Flickr

'a detailed matte painting of a factory' Disco Diffusion v5 Turbo Smooth
a detailed matte painting of a factory

'a hacker by Mykola Burachek' Disco Diffusion v5 Turbo Smooth
a hacker by Mykola Burachek

'a sea monster CGSociety' Disco Diffusion v5 Turbo Smooth
a sea monster CGSociety

'a surrealist painting of a happy person' Disco Diffusion v5 Turbo Smooth
a surrealist painting of a happy person

'a tardigrade by Cosmo Alexander' Disco Diffusion v5 Turbo Smooth
a tardigrade by Cosmo Alexander

'an anime drawing of an evening landscape by Daphne Fedarb photorealistic' Disco Diffusion v5 Turbo Smooth
an anime drawing of an evening landscape by Daphne Fedarb photorealistic

'an art deco painting of a happy person by John Uzzell Edwards' Disco Diffusion v5 Turbo Smooth
an art deco painting of a happy person by John Uzzell Edwards

'chalk art of a bouquet of flowers' Disco Diffusion v5 Turbo Smooth
chalk art of a bouquet of flowers

'the human condition' Disco Diffusion v5 Turbo Smooth
the human condition


Name: Augmented CLIP Guided Diffusion v2
Author: Peter Baylies
Original script: https://github.com/pbaylies/Augmented_CLIP
Time for 512×512 on a 3090: 2 minutes 48 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: 512×512 4 minutes 56 seconds
Description: Updaterd version of the Augmented CLIP Guided Diffusion script.

'a bungalow 4K HD realism' Augmented CLIP Guided Diffusion v2
a bungalow 4K HD realism

'a forest fire' Augmented CLIP Guided Diffusion v2
a forest fire

'a lush rainforest CryEngine' Augmented CLIP Guided Diffusion v2
a lush rainforest CryEngine

'a painting of a kitchen by Betye Saar' Augmented CLIP Guided Diffusion v2
a painting of a kitchen by Betye Saar

'a portrait of a princess trending on artstation' Augmented CLIP Guided Diffusion v2
a portrait of a princess trending on artstation

'a spooky forest' Augmented CLIP Guided Diffusion v2
a spooky forest

'a tattoo of a zombie' Augmented CLIP Guided Diffusion v2
a tattoo of a zombie

'a werewolf by David Cooke Gibson' Augmented CLIP Guided Diffusion v2
a werewolf by David Cooke Gibson

'an oil painting of a lake' Augmented CLIP Guided Diffusion v2
an oil painting of a lake

'an ugly man' Augmented CLIP Guided Diffusion v2
an ugly man


Name: v-diffusion
Author: Katherine Crowson
Original script: https://github.com/crowsonkb/v-diffusion-pytorch
Time for 512×512 on a 3090: 3 minutes 57 seconds
Maximum resolution on a 24 GB 3090: 896×512 or 640×640.
Maximum resolution on an 8GB 2080: 128×128 1 minute 19 seconds
Description: Updated version of Velocity-Diffusion. Tends to make incoherant collage images over 256×256.

'a black and white photo of a portrait of a young girl' v-diffusion Text-to-Image
a black and white photo of a portrait of a young girl

'a cityscape by Lujo Bezeredi' v-diffusion Text-to-Image
a cityscape by Lujo Bezeredi

'a cloudy sunset' v-diffusion Text-to-Image
a cloudy sunset

'a hologram of a sad face by Josef Šíma' v-diffusion Text-to-Image
a hologram of a sad face by Josef Šíma

'a lounge room by Riad Beyrouti IMAX' v-diffusion Text-to-Image
a lounge room by Riad Beyrouti IMAX

'a mountain path' v-diffusion Text-to-Image
a mountain path

'a portrait of a young boy made of metal' v-diffusion Text-to-Image
a portrait of a young boy made of metal

'a portrait of a young girl' v-diffusion Text-to-Image
a portrait of a young girl

'a space nebula' v-diffusion Text-to-Image
a space nebula

'an acrylic painting of a mountain range' v-diffusion Text-to-Image
an acrylic painting of a mountain range


Name: GLID-3
Author: Jack Qiao
Original script: https://github.com/Jack000/glid-3
Time for 512×512 on a 3090: 35 seconds
Maximum resolution on a 24 GB 3090: 768×768.
Maximum resolution on an 8GB 2080: 512×512 50 seconds
Description: Great textures and lighting. Poor image coherency.

'a cemetery' GLID-3 Text-to-Image
a cemetery

'a drawing of a cloudy sunset' GLID-3 Text-to-Image
a drawing of a cloudy sunset

'a drawing of a human lens flare' GLID-3 Text-to-Image
a drawing of a human lens flare

'a lake' GLID-3 Text-to-Image
a lake

'a large waterfall made of silver' GLID-3 Text-to-Image
a large waterfall made of silver

'a marina' GLID-3 Text-to-Image
a marina

'a minimalist painting of a teddy bear by Johann Ludwig Bleuler' GLID-3 Text-to-Image
a minimalist painting of a teddy bear by Johann Ludwig Bleuler

'a renaissance painting of paranoia made of vines' GLID-3 Text-to-Image
a renaissance painting of paranoia made of vines

'an abbey by Cornelis Pietersz' GLID-3 Text-to-Image
an abbey by Cornelis Pietersz

'an art deco painting of a rose' GLID-3 Text-to-Image
an art deco painting of a rose


Name: Disco Diffusion v5.1
Authors: @Somnai, @Gandamu and Chris Allen
Original script: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb
Time for 512×512 on a 3090: 2 minutes 05 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 1152×512. 4 minutes 40 seconds.
Description: Latest version of Disco Diffusion incorporating the “Turbo” features of v5 that gives fast and smooth movie outputs.

'a flemish baroque of a sunset' Disco Diffusion v5 Turbo Smooth
a flemish baroque of a sunset

'a marsh' Disco Diffusion v5 Turbo Smooth
a marsh

'a mid-nineteenth century engraving of New York City' Disco Diffusion v5 Turbo Smooth
a mid-nineteenth century engraving of New York City

'a minimalist painting of a cephalopod' Disco Diffusion v5 Turbo Smooth
a minimalist painting of a cephalopod

'a photo of Dracula' Disco Diffusion v5 Turbo Smooth
a photo of Dracula

'a watercolor painting of a knight' Disco Diffusion v5 Turbo Smooth
a watercolor painting of a knight

'an ugly person by Samuel Colman trending on ArtStation' Disco Diffusion v5 Turbo Smooth
an ugly person by Samuel Colman trending on ArtStation

'chalk art of Gandalf' Disco Diffusion v5 Turbo Smooth
chalk art of Gandalf

'lineart of a zombie' Disco Diffusion v5 Turbo Smooth
lineart of a zombie

'the Amazon Rainforest 4K HD realism' Disco Diffusion v5 Turbo Smooth
the Amazon Rainforest 4K HD realism


Name: Latent Diffusion LAION_400M
Authors: @multimodalart
Original script: https://colab.research.google.com/github/multimodalart/latent-diffusion-notebook/blob/main/Latent_Diffusion_LAION_400M_model_text_to_image.ipynb
Time for 512×512 on a 3090: 57 seconds
Maximum resolution on a 24 GB 3090: 1152×512 or 768×768
Maximum resolution on an 8GB 2080: 256×256. 1 minute 12 seconds.
Description: A new script based on the newly trained LAION_400M moidel. Impressive results at 256×256. Loses coherency at larger sizes. These examples are 4 256×256 images of each prompt.

'a black and white photo of a nightmare creature' Latent Diffusion LAION_400M
a black and white photo of a nightmare creature

'a futuristic city' Latent Diffusion LAION_400M
a futuristic city

'a hyperrealistic painting of a queen made of flowers' Latent Diffusion LAION_400M
a hyperrealistic painting of a queen made of flowers

'a painting of a happy clown' Latent Diffusion LAION_400M
a painting of a happy clown

'a skeleton' Latent Diffusion LAION_400M
a skeleton

'a stained glass window 4K HD realism' Latent Diffusion LAION_400M
a stained glass window 4K HD realism

'a watercolor painting of a lounge room' Latent Diffusion LAION_400M
a watercolor painting of a lounge room

'an eagle' Latent Diffusion LAION_400M
an eagle

'an ultrafine detailed painting of Harry Potter' Latent Diffusion LAION_400M
an ultrafine detailed painting of Harry Potter

'vector art of a zombie by Oskar Kokoschka' Latent Diffusion LAION_400M
vector art of a zombie by Oskar Kokoschka


Name: JAX CLIP Guided Diffusion v2.7
Author: nshepperd
Original script: https://colab.research.google.com/drive/1nmtcbQsE8sTjfLJ1u3Y4d6vi9ZTAvQph
Time for 512×512 on a 3090: 2 minutes 37 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 512×512. 3 minutes 59 seconds.
Description: ANother diffusion based script. Can give very nice high detail results.

'a Dalek made of feathers' JAX CLIP Guided Diffusion v2.7
a Dalek made of feathers

'a haunted house' JAX CLIP Guided Diffusion v2.7
a haunted house

'a picture of a chateau by Odhise Paskali' JAX CLIP Guided Diffusion v2.7
a picture of a chateau by Odhise Paskali

'a refinery' JAX CLIP Guided Diffusion v2.7
a refinery

'a studio by Allan Ramsay trending on ArtStation' JAX CLIP Guided Diffusion v2.7
a studio by Allan Ramsay trending on ArtStation

'a sunset' JAX CLIP Guided Diffusion v2.7
a sunset

'a thunder storm' JAX CLIP Guided Diffusion v2.7
a thunder storm

'a watercolor painting of a fire breathing dragon' JAX CLIP Guided Diffusion v2.7
a watercolor painting of a fire breathing dragon

'a witch made of mist' JAX CLIP Guided Diffusion v2.7
a witch made of mist

'the tropics by Thomas de Keyser' JAX CLIP Guided Diffusion v2.7
the tropics by Thomas de Keyser


Name: GLID-3-XL
Author: Jack Qiao
Original script: https://github.com/Jack000/glid-3-xl
Time for 512×512 on a 3090: 1 minute 04 seconds
Maximum resolution on a 24 GB 3090: 512×512.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: Improved/updated version of GLID-3. Uses CLIP for better accuracy. Great textures and lighting. Poor image coherency when over 256×256.

'a demon' GLID-3-XL Text-to-Image
a demon

'a detailed matte painting of a bouquet of flowers' GLID-3-XL Text-to-Image
a detailed matte painting of a bouquet of flowers

'a kitchen' GLID-3-XL Text-to-Image
a kitchen

'a photorealistic painting of a movie monster hyperrealistic' GLID-3-XL Text-to-Image
a photorealistic painting of a movie monster hyperrealistic

'a picture of The Incredible Hulk by Kazimir Malevich' GLID-3-XL Text-to-Image
a picture of The Incredible Hulk by Kazimir Malevich

'a pop art painting of an angry woman' GLID-3-XL Text-to-Image
a pop art painting of an angry woman

'a spooky forest' GLID-3-XL Text-to-Image
a spooky forest

'an abbey' GLID-3-XL Text-to-Image
an abbey

'New York City by Marie Courtois' GLID-3-XL Text-to-Image
New York City by Marie Courtois

'poster art of Gandalf vivid colors' GLID-3-XL Text-to-Image
poster art of Gandalf vivid colors


Name: ruDALL-E Aspect Ratio
Author: Alex Shonenkov
Original script: https://github.com/shonenkov-AI/rudalle-aspect-ratio
Time for 512×512 on a 3090: N/A
Maximum resolution on a 24 GB 3090: N/A
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: Version of ruDALL-E that generates wide and/or tall aspect ratio images. The shorter side is limited to 256 pixels. Results can be very nice. Will generate multiple images at once, so these sample images have 4 results per prompt.

'a black and white photo of a werewolf' ruDALL-E Aspect Ratio Text-to-Image
a black and white photo of a werewolf

'a cartoon of a swamp' ruDALL-E Aspect Ratio Text-to-Image
a cartoon of a swamp

'a large waterfall made of metal' ruDALL-E Aspect Ratio Text-to-Image
a large waterfall made of metal

'a lounge room' ruDALL-E Aspect Ratio Text-to-Image
a lounge room

'a matte painting of a townhouse' ruDALL-E Aspect Ratio Text-to-Image
a matte painting of a townhouse

'a palace made of mist' ruDALL-E Aspect Ratio Text-to-Image
a palace made of mist

'a photo of an ugly woman' ruDALL-E Aspect Ratio Text-to-Image
a photo of an ugly woman

'a tropical beach' ruDALL-E Aspect Ratio Text-to-Image
a tropical beach

'an evil clown' ruDALL-E Aspect Ratio Text-to-Image
an evil clown

'dense woodland' ruDALL-E Aspect Ratio Text-to-Image
dense woodland

Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

Text-to-Image Summary – Part 5

This is Part 5. There is also Part 1, Part 2, Part 3, Part 4, Part 6, Part 7, Part 8 and Part 9.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: Multi-Perceptor CLIP Guided Diffusion Secondary Model Method
Author: SOMNAI
Original script: https://colab.research.google.com/drive/1Pf5F84FzWe9iAKNbiPaEo_v4hvQZ9SqS
Time for 512×512 on a 3090: 7 minutes 23 seconds
Maximum resolution on a 24 GB 3090: 1792×768 or 2048×640.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: The winner for the longest name so far. Needs tweaking as the addition of the secondary model here reduces the usual excellent quality of the Multi-Perceptor CLIP Guided Diffusion. Still shows a lot of potential.

'a 3D render of Robocop' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a 3D render of Robocop

'a futuristic city IMAX' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a futuristic city IMAX

'a matte painting of trypophobia' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a matte painting of trypophobia

'a renaissance painting of a cloudy sunset trending on ArtStation' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a renaissance painting of a cloudy sunset trending on ArtStation

'a woman 4K photo' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a woman 4K photo

'an evil clown Flickr' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
an evil clown Flickr

'an oil painting of a nightmare creature by Louis Janmot' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
an oil painting of a nightmare creature by Louis Janmot

'Indiana Jones' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
Indiana Jones

'reflective spheres' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
reflective spheres

'zombies filmic' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
zombies filmic


Name: Multi-Perceptor VQGAN+CLIP v2
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 3 minutes 45 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Version 2 of Remi’s Multi-Perceptor VQGAN+CLIP script.

'a babbling brook by Zhou Wenjing' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a babbling brook by Zhou Wenjing

'a bedroom by Francesco Furini' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a bedroom by Francesco Furini

'a computer by Édouard Detaille' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a computer by Édouard Detaille

'a cross stitch of a landscape vivid colors' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a cross stitch of a landscape vivid colors

'a kitchen filmic' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a kitchen filmic

'a matte painting of halloween' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a matte painting of halloween

'a pastel of a peacock' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a pastel of a peacock

'a storybook illustration of a kitchen by Lena Alexander' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a storybook illustration of a kitchen by Lena Alexander

'an oil on canvas painting of a zombie made of voxels' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
an oil on canvas painting of a zombie made of voxels

'vector art of Darth Vader' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
vector art of Darth Vader


Name: 360Diffusion
Author: @sadly_existent
Original script: https://colab.research.google.com/github/sadnow/360Diffusion/blob/main/360Diffusion_Public.ipynb
Time for 512×512 on a 3090: 2 minutes 50 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 28 seconds
Description: A new diffusion based script. Capable of some interesting results

'a bronze sculpture of a crying person by Auguste BaudBovy' 360Diffusion Text-to-Image
a bronze sculpture of a crying person by Auguste BaudBovy

'a flemish baroque of a bouquet of flowers' 360Diffusion Text-to-Image
a flemish baroque of a bouquet of flowers

'a haunted house trending on ArtStation' 360Diffusion Text-to-Image
a haunted house trending on ArtStation

'a hyperrealistic painting of trypophobia by Xia Gui' 360Diffusion Text-to-Image
a hyperrealistic painting of trypophobia by Xia Gui

'a nightmare creature' 360Diffusion Text-to-Image
a nightmare creature

'a space nebula rendered in Cinema4D' 360Diffusion Text-to-Image
a space nebula rendered in Cinema4D

'a tentacle monster 4K HD realism' 360Diffusion Text-to-Image
a tentacle monster 4K HD realism

'an oil on canvas painting of Danny Trejo by Pablo Rey' 360Diffusion Text-to-Image
an oil on canvas painting of Danny Trejo by Pablo Rey

'Frankenstein' 360Diffusion Text-to-Image
Frankenstein

'heaven 8K 3D' 360Diffusion Text-to-Image
heaven 8K 3D


Name: Multi-Perceptor VQGAN+CLIP v3
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 3 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Version 3 of Remi’s Multi-Perceptor VQGAN+CLIP script.

'a bronze sculpture of Gandalf' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a bronze sculpture of Gandalf

'a clown made of clay' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a clown made of clay

'a detailed painting of a desert oasis' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a detailed painting of a desert oasis

'a house by Kathleen Guthrie' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a house by Kathleen Guthrie

'a peacock made of metal' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a peacock made of metal

'a tilt shift photo of the Las Vegas strip' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a tilt shift photo of the Las Vegas strip

'a watercolor painting of reflective spheres 8K 3D' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a watercolor painting of reflective spheres 8K 3D

'an art deco painting of an amusement park' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
an art deco painting of an amusement park

'lineart of Big Bird by Alesso Baldovinetti' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
lineart of Big Bird by Alesso Baldovinetti

'vector art of a forest fire' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
vector art of a forest fire


Name: FuseDream
Author: Xingchao Liu et al
Original script: https://github.com/gnobitab/FuseDream
Time for 512×512 on a 3090: 3 minutes 38 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Gives some unique outputs compared to all the previous scripts.

'a clown' FuseDream Text-to-Image
a clown

'a king' FuseDream Text-to-Image
a king

'a matte painting of New York City by Robin Guthrie' FuseDream Text-to-Image
a matte painting of New York City by Robin Guthrie

'a portrait of a young girl' FuseDream Text-to-Image
a portrait of a young girl

'a rough seascape' FuseDream Text-to-Image
a rough seascape

'a sea monster' FuseDream Text-to-Image
a sea monster

'a teddy bear' FuseDream Text-to-Image
a teddy bear

'a werewolf' FuseDream Text-to-Image
a werewolf

'an airbrush painting of an angry woman' FuseDream Text-to-Image
an airbrush painting of an angry woman

'an attractive woman' FuseDream Text-to-Image
an attractive woman


Name: Looking Glass
Author: bearsharktopus
Original script: https://colab.research.google.com/drive/11vdS9dpcZz2Q2efkOjcwyax4oob6N40G
Time for 265×256 on a 3090: 1 minute 19 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 03 seconds
Description: A variation on ruDALL-E that added support for training the output with a single image or directory of images. It does seem to create better results than the raw ruDALL-E scripts (starting from a single image of random Perlin noise).

'a cemetery trending on pixiv' Looking Glass Text-to-Image
a cemetery trending on pixiv

'a colorful parrot' Looking Glass Text-to-Image
a colorful parrot

'a photo of a house' Looking Glass Text-to-Image
a photo of a house

'a rough seascape' Looking Glass Text-to-Image
a rough seascape

'an alien city' Looking Glass Text-to-Image
an alien city

'an angry person by Eric Auld' Looking Glass Text-to-Image
an angry person by Eric Auld

'an angry woman' Looking Glass Text-to-Image
an angry woman

'an ugly woman' Looking Glass Text-to-Image
an ugly woman

'monkeys' Looking Glass Text-to-Image
monkeys

'Yoda' Looking Glass Text-to-Image
Yoda


Name: Velocity Diffusion
Author: Katherine Crowson
Original script: https://github.com/crowsonkb/v-diffusion-pytorch
Time for 512×512 on a 3090: 3 minutes 57 seconds
Maximum resolution on a 24 GB 3090: 896×512 or 640×640.
Maximum resolution on an 8GB 2080: 128×128 1 minute 19 seconds
Description: The latest script from Katherine Crowson. Unique results compared to her previous diffusion based scripts. Worth experimenting with further.

'a detailed matte painting of traffic' Velocity Diffusion Text-to-Image
a detailed matte painting of traffic

'a detailed painting of Jason Vorhees' Velocity Diffusion Text-to-Image
a detailed painting of Jason Vorhees

'a Ghostbuster' Velocity Diffusion Text-to-Image
a Ghostbuster

'a manga drawing of a lounge room by Yayoi Kusama' Velocity Diffusion Text-to-Image
a manga drawing of a lounge room by Yayoi Kusama

'a mountain range CryEngine' Velocity Diffusion Text-to-Image
a mountain range CryEngine

'a portrait of a young girl made of feathers rendered in unreal engine' Velocity Diffusion Text-to-Image
a portrait of a young girl made of feathers rendered in unreal engine

'a zombie' Velocity Diffusion Text-to-Image
a zombie

'lineart of a Rubiks cube' Velocity Diffusion Text-to-Image
lineart of a Rubiks cube

'The Grinch' Velocity Diffusion Text-to-Image
The Grinch

'vector art of Emporer Palpatine' Velocity Diffusion Text-to-Image
vector art of Emporer Palpatine


Name: ruDALL-E Arbitrary Resolution v1
Author: @nev
Original script: https://colab.research.google.com/drive/1DbqOIUIVBPOrJ4MeaV4YkAlb7ilWQjKZ
Time for 512×512 on a 3090: 4 minutes 40 seconds
Maximum resolution on a 24 GB 3090: 1024×1024
Maximum resolution on an 8GB 2080: 768×768 16 minutes 34 seconds
Description: Allows larger resolution images using the ruDALL-E model. Very nice results and supports larger resolutions on GPUs with less VRAM.

'a color pencil sketch of a werewolf' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a color pencil sketch of a werewolf

'a colorful parrot' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a colorful parrot

'a gorilla' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a gorilla

'a painting of a cabin next to a stream in a secluded forest' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a painting of a cabin next to a stream in a secluded forest

'a portrait of a girl with a dragon tattoo' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a portrait of a girl with a dragon tattoo

'a rose vivid colors' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a rose vivid colors

'a sketch of an ugly man' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a sketch of an ugly man

'a surrealist sculpture of a submarine' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a surrealist sculpture of a submarine

'dense woodland' ruDALL-E Arbitrary Resolution v1 Text-to-Image
dense woodland

'medusa' ruDALL-E Arbitrary Resolution v1 Text-to-Image
medusa


Name: ruDALL-E Arbitrary Resolution v2
Author: @nev
Original script: https://colab.research.google.com/drive/1DbqOIUIVBPOrJ4MeaV4YkAlb7ilWQjKZ
Time for 512×512 on a 3090: 4 minutes 40 seconds
Maximum resolution on a 24 GB 3090: 1024×1024
Maximum resolution on an 8GB 2080: 768×768 15 minutes 48 seconds
Description: v2 of the ruDALL-E Arbitrary Resolution script. Allows larger resolution images using the ruDALL-E model. Very nice results and supports larger resolutions on GPUs with less VRAM.

'a bouquet of flowers' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a bouquet of flowers

'a cross stitch of a well kept garden' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a cross stitch of a well kept garden

'a futuristic city' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a futuristic city

'a large waterfall' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a large waterfall

'a minimalist painting of a castle in the mountains' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a minimalist painting of a castle in the mountains

'a photocopy of a monkey vivid colors' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a photocopy of a monkey vivid colors

'a spooky forest by Laura Muntz Lyall' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a spooky forest by Laura Muntz Lyall

'a teddy bear made of wrought iron' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a teddy bear made of wrought iron

'dense woodland' ruDALL-E Arbitrary Resolution v2 Text-to-Image
dense woodland

'God' ruDALL-E Arbitrary Resolution v2 Text-to-Image
God


Name: GLIDE
Author: Unknown
Original script: https://colab.research.google.com/github/openai/glide-text2im/blob/main/notebooks/text2im.ipynb
Time for 256×256 on a 3090: 23 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Maximum resolution on an 8GB 2080: Locked to 256×256
Description: Images are rendered tiny at 64×64 and then upscaled internally within the script to 256×256 for ouput. The model has been “trimmed” so it cannot do anything human related and only does well for subjects it knows about. Hopefully they release the full model and/or train a larger resolutioon model in the future. Nothing to get excited about yet.

'a cathedral' GLIDE Text-to-Image
a cathedral

'a color pencil sketch of a fire breathing dragon by Erwin Bowien' GLIDE Text-to-Image
a color pencil sketch of a fire breathing dragon by Erwin Bowien

'a gorilla' GLIDE Text-to-Image
a gorilla

'a library' GLIDE Text-to-Image
a library

'a mosaic of monkeys' GLIDE Text-to-Image
a mosaic of monkeys

'a painting of a cabin next to a stream in a secluded forest' GLIDE Text-to-Image
a painting of a cabin next to a stream in a secluded forest

'an elephant' GLIDE Text-to-Image
an elephant

'dinosaurs' GLIDE Text-to-Image
dinosaurs

'goldfish' GLIDE Text-to-Image
goldfish

'the Sydney Harbour Bridge lens flare' GLIDE Text-to-Image
the Sydney Harbour Bridge lens flare


Name: Disco Diffusion
Author: @Somnai
Original script: https://colab.research.google.com/drive/1bItz4NdhAPHg5-u87KcH-MmJZjK-XqHN
Time for 512×512 on a 3090: 3 minutes 18 seconds
Maximum resolution on a 24 GB 3090: 2496×1088 11 minutes 50 seconds
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Diffusion script that includes all the latest features. Capable of rendering some very nice large resolution images (it may even do better at larger sized images than smaller resolutions like these samples).

'a cute creature' Disco Diffusion Text-to-Image
a cute creature

'a detailed matte painting of a morning landscape' Disco Diffusion Text-to-Image
a detailed matte painting of a morning landscape

'a peacock made of mist by Reinier Nooms' Disco Diffusion Text-to-Image
a peacock made of mist by Reinier Nooms

'a Pokemon character by William Etty' Disco Diffusion Text-to-Image
a Pokemon character by William Etty

'a polaroid photo of an angry woman' Disco Diffusion Text-to-Image
a polaroid photo of an angry woman

'a rough seascape' Disco Diffusion Text-to-Image
a rough seascape

'a watercolor painting of a mountain path by Mark A Brennan rendered in Cinema4D' Disco Diffusion Text-to-Image
a watercolor painting of a mountain path by Mark A Brennan rendered in Cinema4D

'an attractive woman' Disco Diffusion Text-to-Image
an attractive woman

'computer rendering of a desert oasis rendered in unreal engine' Disco Diffusion Text-to-Image
computer rendering of a desert oasis rendered in unreal engine

\

'the Amazon Rainforest by Qian Du' Disco Diffusion Text-to-Image
the Amazon Rainforest by Qian Du


Name: Infinite Diffusion
Author: https://github.com/crowsonkb/v-diffusion-pytorch
Original script: https://colab.research.google.com/drive/1VJrfInU5RbciXXD_8jzY-FntFqiyj6au
Time for 512×512 on a 3090: 3 minutes 32 seconds
Maximum resolution on a 24 GB 3090: 512×512
Maximum resolution on an 8GB 2080: 256×256 3 minutes 15 seconds
Description: Diffusion basecd script. Very VRAM hungry. Renders some unique images compared to the other methods.

'cookie monster eating a cookie' Infinite Diffusion Text-to-Image
cookie monster eating a cookie

'a renaissance painting of a farm by Bernardo Strozzi' Infinite Diffusion Text-to-Image
a renaissance painting of a farm by Bernardo Strozzi

'a silk screen of God' Infinite Diffusion Text-to-Image
a silk screen of God

'a storybook illustration of a cute monster trending on pixiv' Infinite Diffusion Text-to-Image
a storybook illustration of a cute monster trending on pixiv

'a surrealist painting of Frankenstein' Infinite Diffusion Text-to-Image
a surrealist painting of Frankenstein

'a watercolor painting of Yoda' Infinite Diffusion Text-to-Image
a watercolor painting of Yoda

'a worried woman made of clay lens flare' Infinite Diffusion Text-to-Image
a worried woman made of clay lens flare

'an art deco painting of Luke Skywalker' Infinite Diffusion Text-to-Image
an art deco painting of Luke Skywalker

'an oil painting of Buzz Lightyear' Infinite Diffusion Text-to-Image
an oil painting of Buzz Lightyear

'Chewbacca' Infinite Diffusion Text-to-Image
Chewbacca


Name: minDALL-E
Author: Kakao Brain Corp
Original script: https://github.com/kakaobrain/minDALL-E
Time for 256×256 on a 3090: 1 minutes 59 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Maximum resolution on an 8GB 2080: 256×256 1 minute 59 seconds
Description: Another DALL-E variation script. Locked to 256×256 but can geenrate multiple images each run.

'a cozy den' minDALL-E Text-to-Image
a cozy den

'a digital painting of Chewbacca by Willem van de Velde the Elder' minDALL-E Text-to-Image
a digital painting of Chewbacca by Willem van de Velde the Elder

'a sad person' minDALL-E Text-to-Image
a sad person

'a skull' minDALL-E Text-to-Image
a skull

'a storybook illustration of a happy clown by Gwen Barnard' minDALL-E Text-to-Image
a storybook illustration of a happy clown by Gwen Barnard

'a tree by Colin Gill' minDALL-E Text-to-Image
a tree by Colin Gill

'Bugs Bunny' minDALL-E Text-to-Image
Bugs Bunny

'fireworks by Károly Lotz' minDALL-E Text-to-Image
fireworks by Károly Lotz

'The Grand Canyon' minDALL-E Text-to-Image
The Grand Canyon

'Yoda' minDALL-E Text-to-Image
Yoda


Name: ruDOLPH
Author: SBER AI
Original script: https://github.com/sberbank-ai/ru-dolph
Time for 128×128 on a 3090: 1 minutes 15 seconds
Maximum resolution on a 24 GB 3090: Locked to 128×128
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another ruDALL-E variation script. Locked to a tiny 128×128 resolution for now until they train the larger models. These examples were 4x upscaled with Real ESRGAN.

'a castle' ruDOLPH Text-to-Image
a castle

'a colorful parrot' ruDOLPH Text-to-Image
a colorful parrot

'a fine art painting of an ugly woman' ruDOLPH Text-to-Image
a fine art painting of an ugly woman

'a kitchen' ruDOLPH Text-to-Image
a kitchen

'a pastel of spirals made of plastic' ruDOLPH Text-to-Image
a pastel of spirals made of plastic

'a photorealistic painting of a cityscape' ruDOLPH Text-to-Image
a photorealistic painting of a cityscape

'a portrait of a woman' ruDOLPH Text-to-Image
a portrait of a woman

'a sad person by Ramon Casas i CarbÃ' ruDOLPH Text-to-Image
a sad person by Ramon Casas i CarbÃ

'kittens' ruDOLPH Text-to-Image
kittens

'vector art of a woman' ruDOLPH Text-to-Image
vector art of a woman


Name: CLIP Guided Deep Image Prior
Author: Daniel Russell
Original script: https://colab.research.google.com/drive/1_oqIK8A67EgtJDdfsuJojc5ukNzirdle
Time for 512×512 on a 3090: 1 minutes 45 seconds
Maximum resolution on a 24 GB 3090: 1024×1024 or 1680×720
Maximum resolution on an 8GB 2080: 512×512 (5 minutes 7 seconds) or 640×360
Description: Interesting script that has decent coherency. If only the output was slightly sharper and the colors slightly richer it would be a winner. Still good for unique outputs that the other methods cannot achieve.

'a flemish baroque of a shrine' CLIP Guided Deep Image Prior
a flemish baroque of a shrine

'a statue of a tardigrade made of clay' CLIP Guided Deep Image Prior
a statue of a tardigrade made of clay

'a surrealist painting of a Pixar character' CLIP Guided Deep Image Prior
a surrealist painting of a Pixar character

'a surrealist painting of an evening landscape 4K photo' CLIP Guided Deep Image Prior
a surrealist painting of an evening landscape 4K photo

'an abstract sculpture of an evil clown by Han Gan' CLIP Guided Deep Image Prior
an abstract sculpture of an evil clown by Han Gan

'an ambient occlusion render of Bugs Bunny made of wood' CLIP Guided Deep Image Prior
an ambient occlusion render of Bugs Bunny made of wood

'Cookie Monster' CLIP Guided Deep Image Prior
Cookie Monster

'Jabba The Hutt by Shūbun Tenshō' CLIP Guided Deep Image Prior
Jabba The Hutt by Shūbun Tenshō

'tentacles by Johanna Marie Fosie' CLIP Guided Deep Image Prior
tentacles by Johanna Marie Fosie

'vector art of heaven' CLIP Guided Deep Image Prior
vector art of heaven


Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.