How far can we go with ImageNet for Text-to-Image generation? Training high-quality text-to-image generation models with 1/10th the parameters, 1/1000th the training images, in about 500 H100 hours