Image Hub
Last updated
Last updated
Our Image Hub allows users to create high quality images. It is a fine-tuned version of Stable Diffusion XL (SDXL) model, and is optimized for generating comic style images.
This model was originally trained to generate the Skillful AI CyberCat NFT collection, which performs best with cartoon-style images. However, the model can be adapted for other styles with some experimentation.
History Section: Users can view all previously generated images along with the prompts used. This feature helps users keep track of their creations and revisit past work.
Multiple Images: Users can currently generate up to 4 images simultaneously from a single prompt.
Pricing: At present, generating a single image costs 3 cents. For example, running one prompt with 4 variations will cost 12 cents.
Inference Steps
It controls the number of steps the model takes during image generation.
More Steps: Produces detailed and high-quality images but takes longer. Give more shadows, saturated colours and glossy effect. Works best for 3D character generations.
Fewer Steps: Quicker generation but may result in simpler natural outputs. Sometimes, images may appear simplistic or overly smooth.
Tip: Extreme values for inference steps or guidance scale may lead to overly saturated or distorted images. Start with middle ranges to balance detail and efficiency.
Seed Number (1-1000):
It ensures consistency or introduces variation in image outputs.
Same Seed: Reproduces the exact image style every time.
Different Seeds: Generates diverse styles.
For instance, if you use the same seed (e.g., 75) and input the same prompt (e.g., "a elephant wearing sunglasses”), the resulting images will be identical for that seed.
Seed feature is particularly useful for:
Consistency: Maintaining a uniform style across multiple images, such as creating a series featuring a animal, specific objects etc. in the same artistic style.
Variety: By changing the seed, you can explore different styles for the same subject.
Guidance Scale:
Adjusts how strictly the model follows the prompt.
Higher Scale: Aligns more closely with the prompt, potentially reducing creativity.
Lower Scale: Allows for more flexibility and imaginative results.
Extremes in guidance scale might lead to unusual or broken images, especially if the prompt includes elements outside the system’s training scope.
Comic Style Toggle: Activates the comic/cartoon effect. This is optimized for the model’s training style.
The foundation of the Image Hub is Stable Diffusion XL (SDXL), an open-source model recognized for its exceptional image generation quality and flexible commercial licensing.
We enhanced the model’s capabilities for comic-style outputs through a process of fine-tuning using LoRA (Low-Rank Adaptation). This allowed us to adapt the model without altering its original weights, while retaining memory efficiency and cost-effectiveness.
Using LoRA, we added extra layers (specialized learnable matrices) to the model that focused on the comic style, making it more efficient without using a lot of memory. The training was done using advanced NVIDIA L4 GPUs through Google Colab Pro, a cloud-based service. We used Hugging Face diffusers library and the Accelerate framework to manage resources.
Through constant testing and improvements, the model was refined to generate high-quality comic-style images.
Image Editing: An advanced editor will allow users to modify generated images directly within the platform.
NFT Minting: Users will be able to mint individual images as NFTs. Collections can also be created and minted as NFT sets for broader applications.