How far can we go with ImageNet for Text-to-Image generation?

Training high-quality text-to-image generation models with 1/10th the parameters, 1/1000th the training images, in about 500 H100 hours

MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

Improving text-to-image generation through multi-reward conditioning