How to build a Text to Image App with free AI model in 30 lines of code

6 min readMay 28, 2023

Generate amazing Image with AI, a step by step guide to building a Test-to-Image App

In our fast changing world, AI is taking center stage. It is not surprising that AI has caught our attention because it has the potential to improve efficiency in various industries. You may have experimented text to image tools like Midjourney and Dream Studio but in this article, I will guide you step by step guide on creating your very own “midjourney” text to image generator app from the scratch using Stable diffusion. It is about 30 lines of code. Since stable diffusion is open source, the model is free.

Pre-requisites

Let’s make sure you have everything you need before diving into building your own text-to-image generator app.

You need a laptop with a GPU. Don’t worry if your computer lacks GPU, you can use Google Colab to access a free GPU, but using a laptop with GPU is more optima.
Set up an Integrated Development Environment(IDE) with a python environment.

With these vital conditions in place, you are set.

Introduction

Let us get a brief understanding of the world of diffusion models which are used for generating images from text. A diffusion model uses a markov chain to gradually introduce noise to data then reverse the process and produce the desired data sample from noise. Notable examples of diffusion models include DALLE 2 by OpenAI, Imagen by Google, and Stable diffusion developed by StabilityAI. In this article, our focus will be on using Stable Diffusion from Stability AI to build our APP. Even though generative models like Generative Adversarial Networks(GANs), Variational Autoencoders(VAEs) and Flow-based models are well known, diffusion models are more recent, efficient and produce better results.

Setup and installation

We can get access to stable diffusion model through Hugging face. Hugging face is a community and a platform that provides access to open source machine learning models and dataset. Since the models are open source, hugging face is free though there is also a pay tier. It is necessary to create a profile on hugging face if you don’t have one already. After creating a profile, you need an “Access token”.

Click on your profile icon
Click on “Settings”
Navigate to “Access Tokens” on the left tab
You can either generate a new token or utilize an existing one.
Copy the token

Token needed to access the Stable diffusion model is free which makes hugging face a powerful resource.

Hugging face access token process. Image by John Robert(Author)

Create a file “authtoken.py”. This file will contain your access token to hugging face

# authtoken.py

# How to get one: https://huggingface.co/docs/hub/security-tokens
auth_token = "{COPY ACCESS TOKEN FROM HUGGING FACE}"

Install all requirements

Create a file “requirements.txt”. If you are familiar with coding in python, you should already know what this file is used for. “requirements.txt” file contains all the required libraries needed to be installed for a python project.

# requirements.txt

customtkinter==4.6.1
darkdetect==0.7.1
diffusers==0.3.0
filelock==3.8.0
huggingface-hub==0.9.1
idna==3.4
importlib-metadata==4.12.0
numpy==1.23.3
packaging==21.3
Pillow==9.2.0
pyparsing==3.0.9
PyYAML==6.0
regex==2022.9.13
requests==2.28.1
tk==0.1.0
tokenizers==0.12.1 
torch==1.12.1+cu113
torchaudio==0.12.1+cu113
torchvision==0.13.1+cu113
tqdm==4.64.1
transformers==4.22.1
typing_extensions==4.3.0
urllib3==1.26.12
zipp==3.8.1

Run the command below to install the libraries in your requirements.txt

pip install requirements.txt

Building the app

To ensure a clear understanding of the process, let’s outline the step by step approach we will follow in building the app.

Implement “generate_image” functionImport all necessary libraries
Create a user interface with Tkinter and Customtkinter
Download the Stable Diffusion model from Hugging Face
Implement “generate_image” function
Create a button to trigger the “generate_image function”

Import all necessary libraries — We start by including the necessary libraries into our code, this will give our app access to all the required application functionalities.

# Libraries for building GUI 
import tkinter as tk
import customtkinter as ctk 

# Machine Learning libraries 
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline

# Libraries for processing image 
from PIL import ImageTk

# private modules 
from authtoken import auth_token

Create a user interface with Tkinter and Customtkinter — Tkinter is a well known python library that can be used to create a Graphical User Interface(GUI) for an application. User will have a visual interface to interface with the stable diffusion model and see results generated by the model.

# Create app user interface
app = tk.Tk()
app.geometry("532x632")
app.title("Text to Image app")
app.configure(bg='black')
ctk.set_appearance_mode("dark") 

# Create input box on the user interface 
prompt = ctk.CTkEntry(height=40, width=512, text_font=("Arial", 15), text_color="white", fg_color="black") 
prompt.place(x=10, y=10)

# Create a placeholder to show the generated image
img_placeholder = ctk.CTkLabel(height=512, width=512, text="")
img_placeholder.place(x=10, y=110)

Download stable diffusion model from hugging face — use hugging face platform to get the stable diffusion model. This model will serve as the backbone for generating images based on text or prompts.

# Download stable diffusion model from hugging face 
modelid = "CompVis/stable-diffusion-v1-4"
device = "cuda"
stable_diffusion_model = StableDiffusionPipeline.from_pretrained(modelid, revision="fp16", torch_dtype=torch.float16, use_auth_token=auth_token) 
stable_diffusion_model.to(device)

Implement “generate_image” function — Develop a function called “generate_image” that will be triggered whenever a click event occurs on the user interface. This function will send the text or prompt to the stable diffusion model and receive an image as response. The image will then be displayed within the user interface.

# Generate image from text 
def generate_image(): 
    """ This function generate image from a text with stable diffusion"""
    with autocast(device): 
        image = stable_diffusion_model(prompt.get(),guidance_scale=8.5)["sample"][0]
    
    # Save the generated image
    image.save('generatedimage.png')
    
    # Display the generated image on the user interface
    img = ImageTk.PhotoImage(image)
    img_placeholder.configure(image=img)

Create a button to trigger the “generate_image function” — implement a button within the user interface that users can click to initiate the “generate_image” function. With the help of the button, users will have an easy way to create images based on their input text.

trigger = ctk.CTkButton(height=40, width=120, text_font=("Arial", 15), text_color="black", fg_color="white",
                         command=generate_image) 
trigger.configure(text="Generate")
trigger.place(x=206, y=60) 

app.mainloop()

Execute App

To execute the code, let us consolidate all the necessary code into a python script named “app.py”. This script will serve as the central repository for our application. Once we have completed writing the code and saved it as “app.py”, you can run the script using the following command in your command line interface or terminal

python app.py

Executing this command will initiate the execution of the python script, launching our application and enable user to interact with our app.

# app.py 

# Libraries for building GUI 
import tkinter as tk
import customtkinter as ctk 

# Machine Learning libraries 
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline

# Libraries for processing image 
from PIL import ImageTk

# private modules 
from authtoken import auth_token


# Create app user interface
app = tk.Tk()
app.geometry("532x632")
app.title("Text to Image app")
app.configure(bg='black')
ctk.set_appearance_mode("dark") 

# Create input box on the user interface 
prompt = ctk.CTkEntry(height=40, width=512, text_font=("Arial", 15), text_color="white", fg_color="black") 
prompt.place(x=10, y=10)

# Create a placeholder to show the generated image
img_placeholder = ctk.CTkLabel(height=512, width=512, text="")
img_placeholder.place(x=10, y=110)

# Download stable diffusion model from hugging face 
modelid = "CompVis/stable-diffusion-v1-4"
device = "cuda"
stable_diffusion_model = StableDiffusionPipeline.from_pretrained(modelid, revision="fp16", torch_dtype=torch.float16, use_auth_token=auth_token) 
stable_diffusion_model.to(device) 

# Generate image from text 
def generate_image(): 
    """ This function generate image from a text with stable diffusion"""
    with autocast(device): 
        image = stable_diffusion_model(prompt.get(),guidance_scale=8.5)["sample"][0]
    
    # Save the generated image
    image.save('generatedimage.png')
    
    # Display the generated image on the user interface
    img = ImageTk.PhotoImage(image)
    img_placeholder.configure(image=img) 


trigger = ctk.CTkButton(height=40, width=120, text_font=("Arial", 15), text_color="black", fg_color="white",
                         command=generate_image) 
trigger.configure(text="Generate")
trigger.place(x=206, y=60) 

app.mainloop()

You can also find the complete project on my github

Conclusion

We learnt the process of building a text to image app using Tkinter for the user interface and Stable Diffusion model from stability AI via Hugging Face in 30 lines of code. Now, you can create your own text to image generation tool using AI.