本项目已集成使用 DiffSynth Engine 的文生图功能,支持多种扩散模型进行图像生成。
确保安装了 diffsynth-engine 和相关依赖:
# 安装 diffsynth-engine
uv pip install diffsynth-engine
# 如果使用 CUDA
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
或使用项目的可选依赖:
uv sync --extra diffsynth --extra cuda
启动 Gradio 应用:
python gradio_app.py
然后访问 http://localhost:7860,在”文生图”标签页中:
from diffsynths.text_to_image import generate_image
# 简单使用
image_path = generate_image(
prompt="a beautiful sunset over mountains",
negative_prompt="low quality, blurry",
model_type="sd",
width=512,
height=512,
)
print(f"图像已保存至: {image_path}")
启动 FastAPI 服务:
python app.py
curl -X POST "http://localhost:8000/api/text-to-image/generate" \
-H "Content-Type: application/json" \
-d '{
"prompt": "a beautiful sunset over mountains",
"negative_prompt": "low quality, blurry",
"model_type": "sd",
"width": 512,
"height": 512,
"num_inference_steps": 20,
"guidance_scale": 7.5,
"seed": 42
}'
响应:
{
"success": true,
"image_path": "outputs/text_to_image/xxxxx.png",
"message": "图像生成成功"
}
curl -X POST "http://localhost:8000/api/text-to-image/batch-generate" \
-H "Content-Type: application/json" \
-d '{
"prompts": [
"a red apple",
"a blue car",
"a green tree"
],
"model_type": "sd",
"width": 512,
"height": 512
}'
curl -X POST "http://localhost:8000/api/text-to-image/load-model?model_type=sdxl"
curl -X POST "http://localhost:8000/api/text-to-image/unload-model"
from diffsynths.text_to_image import get_generator
generator = get_generator()
generator.load_model("sd")
prompts = [
"a red apple on a table",
"a blue car in the street",
"a green tree in the park",
]
image_paths = generator.batch_generate(
prompts=prompts,
width=512,
height=512,
num_inference_steps=20,
guidance_scale=7.5,
seed=42, # 每张图会使用 seed, seed+1, seed+2...
)
for i, path in enumerate(image_paths):
print(f"图像 {i + 1} 已保存至: {path}")
from diffsynths.text_to_image import TextToImageGenerator
generator = TextToImageGenerator(
model_path="/path/to/your/model.safetensors",
device="cuda"
)
generator.load_model("sd")
image_path = generator.generate(
prompt="your prompt here",
width=768,
height=768,
)
sd: Stable Diffusion 1.5/2.1sdxl: Stable Diffusion XLsd3: Stable Diffusion 3flux: Flux.1hunyuan: HunyuanDiT描述您想要生成的图像内容。建议:
示例:
"a beautiful landscape with mountains and a lake, sunset, 4k, highly detailed"
描述您不想在图像中出现的元素。常用:
"low quality, blurry, deformed, ugly, bad anatomy, watermark"
diffsynth/
├── __init__.py # 模块导出
├── text_to_image.py # 文生图核心实现
├── diffsynth.py # PDF 处理功能
├── aduib_ai.py # AduibAI 客户端
└── mineru.py # MinerU PDF 解析
controllers/
├── text_to_image.py # 文生图 API 端点
└── route.py # API 路由注册
outputs/
└── text_to_image/ # 生成的图像输出目录
├── xxxxx.png
└── ...
batch_generate 方法可以复用已加载的模型ImportError: cannot import name 'ModelManager' from 'diffsynth'
解决:安装 diffsynth-engine
uv pip install diffsynth-engine
RuntimeError: CUDA out of memory
解决:
确保模型文件在正确位置,或提供正确的 model_path 参数。
完整示例请参考:
diffsynth/text_to_image.py - 核心实现controllers/text_to_image.py - API 端点gradio_app.py - Web UI 集成如有问题,请查看: