In text-to-image modelling, Stable Diffusion has increased the pace of development when it comes to generative models. However, it does not come with its problems - including slow convergence and difficulty handling high-dimensional data (S., S., n.d.). Some researchers have proposed a finetuned variant of the model instead, named DreamShaper.
In fact, DreamShaper has 8 versions already - of which the last is presumed to be the final one.
Read this article if you wish to know more about what Stable Diffusion problems it solves. In this article, we'll focus on using it instead!
In this article, we're going to build a diffusers
pipeline with (an LCM-LoRA-finetuned version of, for speeding up inference) DreamShaper 7. See the header image for what it's capable of generating!
In order to run the code you'll create, you need to install:
Let's create a file named dreamshaperpipeline.py
. In it, we're starting with the imports and filling some settings:
import torch
from diffusers import DiffusionPipeline, LCMScheduler
import matplotlib.pyplot as plt
size = 512 # 512x512 pixels
num_inference_steps = 4 # number of diffusion steps
guidance_scale = 0.0 # no guidance
Torch is needed because diffusers
depends on it; we'll visualize the images with Matplotlib.
As you can see, in this example, you're generating 512 x 512 pixel images (feel free to set it to smaller or larger ones, but do recognize that this may impact the hardware you'll need to run it successfully!). We use 4 diffusion steps for doing so and let no classifier guide the model (this is a technical step related to the LCM-LoRA process).
The next step involves actually creating the DreamShaper 7 pipeline. We're using HuggingFace's DiffusionPipeline
for this goal.
The DiffusionPipeline is the quickest way to load any pretrained diffusion pipeline from the Hub for inference HuggingFace, n.d..
We do this by initializing the DiffusionPipeline
from the pretrained Lykon/dreamshaper-7
model. Subsequently, we check if CUDA is available - in other words, if you can run this pipeline on your GPU - and if so, enable it. This will speed up running the pipeline significantly.
Then, we're using the LCMScheduler
with the pipeline configuration and load the latent-consistency/lcm-lora-sdv1-5
weights. These weights are LoRA weights weights meaning that the model was finetuned using the LoRA technique. However, it was done in a particular way: to enable fast inference. In fact, using these weights speeds up inference a lot.
Multistep and onestep scheduler (Algorithm 3) introduced alongside latent consistency models in the paper Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference by Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. This scheduler should be able to generate good samples from LatentConsistencyModelPipeline in 1-8 steps (HuggingFace, n.d.)
Finally, we return the pipeline.
Here's the code:
def load_dreamshaper_lora_pipeline():
"""
Load the DreamShaper 7 model with LCM LoRA adapters for fast inference.
"""
# Create a DiffusionPipeline using the pretrained DreamShaper 7 model
pipeline = DiffusionPipeline.from_pretrained("Lykon/dreamshaper-7")
# Use CUDA if available
if torch.cuda.is_available():
pipeline.to("cuda")
# Use the LCM LoRA adapters for fast inference
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")
return pipeline
Then, we ask the user for the prompt - in other words, what they want to visualize:
def ask_for_prompt():
"""
Ask the user for a prompt.
"""
prompt = input("What do you want to visualize?\n")
return prompt
This is followed by a definition which allows us to generate the images. It takes the pipeline
, the prompt
and some extra settings:
num_inference_steps
parameter, which indicates the number of diffusion steps during inference. In our case, that's 4 steps.guidance
parameter, which signals whether we classifier guide the pipeline; in our case, we do not.size
parameter, describing the width and the height of the image.def generate_images(pipeline, prompt, num_inference_steps, guidance_scale, size):
"""
Generate images using the pipeline.
"""
results = pipeline(
prompt=prompt,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
height=size,
width=size
)
return results
Subsequently, we show the final image - this part is just visualizing the image with Matplotlib.
def show_image(results):
"""
Show an image.
"""
# Create a figure without any border or axis
fig, ax = plt.subplots(figsize=(30, 30))
ax.imshow(results.images[0])
ax.axis('off') # Turn off axis labels and ticks
# Show the image without borders
plt.subplots_adjust(left=0, right=1, top=1, bottom=0) # Remove extra white space
plt.show()
Finally, we combine everything together in a main
def:
def main():
"""
Main function.
"""
# Load the pipeline
pipeline = load_dreamshaper_lora_pipeline()
# Ask for a prompt
prompt = ask_for_prompt()
# Generate images
results = generate_images(pipeline, prompt, num_inference_steps, guidance_scale, size)
# Show the image
show_image(results)
if __name__ == "__main__":
main()
Let's now run the script.
> python dreamshaper7pipeline.py
...and now take a look at what it produces for some basic prompts.
An orange at a beach:
The skyline of New York City during sunset, dreamscape:
I also let ChatGPT generate a more complex prompt.
Create an image that combines the concept of 'bioluminescent jungle' with 'steampunk cityscape.' Imagine a lush, glowing forest filled with exotic flora and fauna juxtaposed against a sprawling metropolis of intricate, Victorian-inspired machinery. The blending of natural wonder and mechanical innovation should be visually stunning and captivating.
This is what it looks like:
Here's another one:
Imagine a world where gravity is reversed, and people live on the undersides of floating islands in the sky. Create an image that showcases the everyday life of the island-dwellers, from their upside-down houses and gardens to their unique modes of transportation. Highlight the challenges and innovations of living in a world with 'reverse gravity.'
Pretty awesome!
Here's the full code if you're interested:
import torch
from diffusers import DiffusionPipeline, LCMScheduler
import matplotlib.pyplot as plt
size = 512 # 512x512 pixels
num_inference_steps = 4 # number of diffusion steps
guidance_scale = 0.0 # no guidance
def load_dreamshaper_lora_pipeline():
"""
Load the DreamShaper 7 model with LCM LoRA adapters for fast inference.
"""
# Create a DiffusionPipeline using the pretrained DreamShaper 7 model
pipeline = DiffusionPipeline.from_pretrained("Lykon/dreamshaper-7")
# Use CUDA if available
if torch.cuda.is_available():
pipeline.to("cuda")
# Use the LCM LoRA adapters for fast inference
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")
return pipeline
def ask_for_prompt():
"""
Ask the user for a prompt.
"""
prompt = input("What do you want to visualize?\n")
return prompt
def generate_images(pipeline, prompt, num_inference_steps, guidance_scale, size):
"""
Generate images using the pipeline.
"""
results = pipeline(
prompt=prompt,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
height=size,
width=size
)
return results
def show_image(results):
"""
Show an image.
"""
# Create a figure without any border or axis
fig, ax = plt.subplots(figsize=(30, 30))
ax.imshow(results.images[0])
ax.axis('off') # Turn off axis labels and ticks
# Show the image without borders
plt.subplots_adjust(left=0, right=1, top=1, bottom=0) # Remove extra white space
plt.show()
def main():
"""
Main function.
"""
# Load the pipeline
pipeline = load_dreamshaper_lora_pipeline()
# Ask for a prompt
prompt = ask_for_prompt()
# Generate images
results = generate_images(pipeline, prompt, num_inference_steps, guidance_scale, size)
# Show the image
show_image(results)
if __name__ == "__main__":
main()
Learn how large language models and other foundation models are working and how you can train open source ones yourself.
Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.
Read about the fundamentals of machine learning, deep learning and artificial intelligence.
To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!
The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.
All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.
If you have any questions or remarks, feel free to get in touch.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.
Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.
Mathjax is licensed under the Apache License, Version 2.0.