DevLog: ComfyUI, the framework for AI generated images

I finally took the plunge into local AI image generation. After seeing what people were creating with ComfyUI, I had to try it myself. Here’s my setup journey and what I learned.

My Setup

Running locally because I wanted full control:

OS: PopOS (Ubuntu-based Linux)
GPU: NVIDIA RTX 4070
Framework: ComfyUI with custom nodes

The GPU is crucial here. You can run on CPU, but we’re talking minutes per image instead of seconds.

Installation Journey

Step 1: GPU Drivers

This was surprisingly painless on Linux:

# Check available drivers
sudo ubuntu-drivers list

# Install the recommended one
sudo apt install nvidia-driver-545

Reboot, and you’re golden. The days of fighting with Nvidia drivers on Linux are mostly over.

Step 2: ComfyUI Installation

I went with manual installation instead of the portable version. As a developer, I like to see what’s happening under the hood:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

Step 3: Custom Node Manager

This is the real game-changer. It handles installing missing models, CLIP, VAE files - all the stuff that would normally have you hunting through GitHub repos:

cd custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git

Now when a workflow needs a specific model, the manager just downloads it. No more “missing node” errors.

The Cloud Alternative

Don’t have a GPU? I discovered Modal, which lets you run ComfyUI in the cloud. It’s pay-as-you-go, so you’re not dropping $2k on a graphics card just to experiment.

The setup is more involved, but once it’s running, you get the same interface with cloud GPUs doing the heavy lifting.

Workflows That Blew My Mind

3D Mesh Generation

There’s a workflow that generates 3D meshes from text prompts. Not just depth maps - actual mesh files you can import into Blender. We’re living in the future.

SDXL with Fine Control

The basic SDXL workflow is nice, but the community has created versions with incredible control:

Separate prompts for composition and style
Multi-stage refinement
ControlNet integration for pose/depth guidance

Learning Resources

OpenArt.ai’s workflow academy has been invaluable. They break down complex workflows step-by-step. Started with their basic tutorials and now I’m remixing my own workflows.

Performance Reality Check

With my RTX 4070:

512x512 image: ~2 seconds
1024x1024 image: ~8 seconds
SDXL at 1024x1024: ~15 seconds

Without GPU (CPU only):

512x512 image: 2-3 minutes
Anything larger: Go make coffee

Tips for Beginners

Start with existing workflows - Don’t try to build from scratch
Use the manager - Seriously, it saves hours of debugging
Monitor VRAM - 8GB is minimum, 12GB is comfortable
Join the Discord - The community is incredibly helpful

What Surprised Me

The node-based approach is genius. Instead of writing code, you’re visually connecting processing steps. It makes experimentation so much faster.

Also, the ecosystem is moving at light speed. Every week there are new nodes, new models, new techniques. It’s exhausting but exciting.

Next Experiments

I want to try:

Training my own LoRA models
Real-time generation with StreamDiffusion
Combining with LLMs for automated prompt generation
Building a web UI on top for non-technical users

The barrier to entry for AI image generation has never been lower. If you’re curious, just grab ComfyUI and start playing. The worst that happens is you generate some weird images and learn something.