DevLog: ComfyUI, the framework for AI generated images

I finally took the plunge into local AI image generation. After seeing what people were creating with ComfyUI, I had to try it myself. Here’s my setup journey and what I learned.

My Setup

Running locally because I wanted full control:

  • OS: PopOS (Ubuntu-based Linux)
  • GPU: NVIDIA RTX 4070
  • Framework: ComfyUI with custom nodes

The GPU is crucial here. You can run on CPU, but we’re talking minutes per image instead of seconds.

Installation Journey

Step 1: GPU Drivers

This was surprisingly painless on Linux:

# Check available drivers
sudo ubuntu-drivers list

# Install the recommended one
sudo apt install nvidia-driver-545

Reboot, and you’re golden. The days of fighting with Nvidia drivers on Linux are mostly over.

Step 2: ComfyUI Installation

I went with manual installation instead of the portable version. As a developer, I like to see what’s happening under the hood:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

Step 3: Custom Node Manager

This is the real game-changer. It handles installing missing models, CLIP, VAE files - all the stuff that would normally have you hunting through GitHub repos:

cd custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git

Now when a workflow needs a specific model, the manager just downloads it. No more “missing node” errors.

The Cloud Alternative

Don’t have a GPU? I discovered Modal, which lets you run ComfyUI in the cloud. It’s pay-as-you-go, so you’re not dropping $2k on a graphics card just to experiment.

The setup is more involved, but once it’s running, you get the same interface with cloud GPUs doing the heavy lifting.

Workflows That Blew My Mind

3D Mesh Generation

There’s a workflow that generates 3D meshes from text prompts. Not just depth maps - actual mesh files you can import into Blender. We’re living in the future.

SDXL with Fine Control

The basic SDXL workflow is nice, but the community has created versions with incredible control:

  • Separate prompts for composition and style
  • Multi-stage refinement
  • ControlNet integration for pose/depth guidance

Learning Resources

OpenArt.ai’s workflow academy has been invaluable. They break down complex workflows step-by-step. Started with their basic tutorials and now I’m remixing my own workflows.

Performance Reality Check

With my RTX 4070:

  • 512x512 image: ~2 seconds
  • 1024x1024 image: ~8 seconds
  • SDXL at 1024x1024: ~15 seconds

Without GPU (CPU only):

  • 512x512 image: 2-3 minutes
  • Anything larger: Go make coffee

Tips for Beginners

  1. Start with existing workflows - Don’t try to build from scratch
  2. Use the manager - Seriously, it saves hours of debugging
  3. Monitor VRAM - 8GB is minimum, 12GB is comfortable
  4. Join the Discord - The community is incredibly helpful

What Surprised Me

The node-based approach is genius. Instead of writing code, you’re visually connecting processing steps. It makes experimentation so much faster.

Also, the ecosystem is moving at light speed. Every week there are new nodes, new models, new techniques. It’s exhausting but exciting.

Next Experiments

I want to try:

  • Training my own LoRA models
  • Real-time generation with StreamDiffusion
  • Combining with LLMs for automated prompt generation
  • Building a web UI on top for non-technical users

The barrier to entry for AI image generation has never been lower. If you’re curious, just grab ComfyUI and start playing. The worst that happens is you generate some weird images and learn something.