Image Generation Using GenAI

Introduction

Using GenAI allows individuals to create fresh images based on specific prompts effortlessly. Advanced machine learning algorithms are utilized to generate unique visuals that match the given prompt seamlessly.

By simply entering a prompt, users can quickly generate new images without requiring any design skills. The GenAI image generation models are trained on extensive datasets, utilizing this knowledge to effortlessly produce new visuals.

By employing generative adversarial networks (GANs), they are capable of generating images that closely resemble the provided prompts. This versatile capability finds applications across various fields, including website graphics, marketing materials, and advertising campaigns.

This method offers a fast and efficient way of generating images, often delivering impressive results in terms of both quality and relevance.

Implementation

We will use the ByteDance/SDXL-Lightning which uses a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on Stable Diffusion XL (SDXL). It combines progressive and adversarial distillation to achieve a balance between quality and mode coverage.

Getting Started

First, setup a new environment in Python/ VSCode. Open a PowerShell terminal within VSCode and use the command ->

PowerShell

python -m venv . venv

to create a virtual environment. Activate this virtual environment via the Terminal to your workspace using ->

PowerShell

.venv\Scripts\Activate.ps1

Now, install the required libraries using pip->

PowerShell

pip install accelerate==0.28.0
pip install certifi==2024.2.2
pip install charset-normalizer==3.3.2
pip install colorama==0.4.6
pip install diffusers==0.26.3
pip install filelock==3.13.1
pip install fsspec==2024.2.0
pip install huggingface-hub==0.21.4
pip install idna==3.6
pip install importlib_metadata==7.0.2
pip install Jinja2==3.1.3
pip install MarkupSafe==2.1.5
pip install mpmath==1.3.0
pip install networkx==3.2.1
pip install numpy==1.26.4
pip install packaging==24.0
pip install pillow==10.2.0
pip install pip==24.0
pip install psutil==5.9.8
pip install PyYAML==6.0.1
pip install regex==2023.12.25
pip install requests==2.31.0
pip install safetensors==0.4.2
pip install setuptools==58.1.0
pip install sympy==1.12
pip install tokenizers==0.15.2
pip install tqdm==4.66.2
pip install transformers==4.38.2
pip install typing_extensions==4.10.0
pip install urllib3==2.2.1
pip install zipp==3.17.0
pip install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu121/torch_stable.html

Install the relevant CUDA drivers from Nvidia.

Code

Create a new python file for the ImageGeneration.py script->

Python

import torch
from diffusers import (
    StableDiffusionXLPipeline,
    UNet2DConditionModel,
    EulerDiscreteScheduler,
)
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import datetime
import os

base = "stabilityai/stable-diffusion-xl-base-1.0"
repo = "ByteDance/SDXL-Lightning"
ckpt = "sdxl_lightning_4step_unet.safetensors"  # Use the correct ckpt for your step setting!

# Load model.
unet = UNet2DConditionModel.from_config(base, subfolder="unet").to(
    "cuda", torch.float16
)
unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
pipe = StableDiffusionXLPipeline.from_pretrained(
    base, unet=unet, torch_dtype=torch.float16, variant="fp16"
).to("cuda")

# Ensure sampler uses "trailing" timesteps.
pipe.scheduler = EulerDiscreteScheduler.from_config(
    pipe.scheduler.config, timestep_spacing="trailing"
)

while True:
    prompt = input("Enter your description for generating the image:\n")
    # Ensure using the same inference steps as the loaded model and CFG set to 0.
    file_name = (
        "./output/"
        + prompt.replace(" ", "").strip()
        + datetime.datetime.now().strftime("%d_%m_%y_%H_%M_%S")
        + ".png"
    )
    pipe(prompt, num_inference_steps=4, guidance_scale=0).images[0].save(file_name)
    print("Generated Image saved at : ", os.path.abspath(file_name), "\n")

Sample dataset could be downloaded from here.

If everything is configured correctly, the output should be something like->

PowerShell

(.venv) PS C:\Code\Python\Environment\ImageGeneration> python .\ImageGeneration.py
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:02<00:00,  2.81it/s]
Enter your description for generating the image:
A cat looking into the mirror imagining itself as a tiger.
  0%|                                                                                                                                                                                                              | 0/4 [00:00<?, ?it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [04:10<00:00, 62.69s/it]
Generated Image saved at :  C:\Code\Python\Environment\ImageGeneration\output\Acatlookingintothemirrorimaginingitselfasatiger.15_03_24_03_05_09.png 

Enter your description for generating the image:
Sir Isaac Newton sitting under the Apple Tree.
  0%|                                                                                                                                                                                                              | 0/4 [00:00<?, ?it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [04:30<00:00, 67.52s/it]
Generated Image saved at :  C:\Code\Python\Environment\ImageGeneration\output\SirIsaacNewtonsittingundertheAppleTree.15_03_24_03_30_07.png 

Enter your description for generating the image:
Man on walking on the Moon watches the Earth.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [04:08<00:00, 62.14s/it]
Generated Image saved at :  C:\Code\Python\Environment\ImageGeneration\output\ManonwalkingontheMoonwatchestheEarth.15_03_24_04_00_03.png