- Zedge (ZDGE, Financial) introduces DataSeeds.AI Sample Dataset (DSD) to revolutionize AI training.
- AI models trained with DSD exhibit a 70% performance leap over conventional datasets.
- Dataset includes 7,843 peer-reviewed images with expert annotations.
Zedge (ZDGE) has unveiled the DataSeeds.AI Sample Dataset (DSD), a pioneering image dataset particularly designed for training computer vision and generative AI models. This initiative, executed in partnership with Perle.ai and Émet Research, represents a significant advancement in AI training resources.
The dataset comprises 7,843 high-quality images from GuruShots' photography community, complete with extensive annotations and expert reviews. Models trained on DSD showed an impressive 70% improvement over benchmark datasets. A notable highlight is the LLAVA-NEXT model, which demonstrated a 24.09% uplift in BLEU-4 scores. These enhancements underscore the efficacy of DSD compared to existing datasets, including platforms like AWS Rekognition, which struggled with an F1 score of 0.19 against DSD's annotations.
DataSeeds.AI taps into Zedge Premium's extensive catalog of over 30 million rights-cleared images, enabling rapid creation of custom AI training datasets through targeted GuruShots challenges. This positions DataSeeds.AI as a crucial supplier for enterprises seeking high-quality training data.
Jonathan Reich, CEO of Zedge, remarked on the milestone achievement for DataSeeds.AI, noting its potential to supply high-quality, rights-cleared images to enterprises developing foundational AI models. The release marks a strategic shift for Zedge, diversifying its business model to include enterprise-grade AI training resources.
The DSD not only provides pixel-level segmentation, structured scene descriptions, and technical metadata but also emphasizes human-aligned annotations. This approach enhances the interpretability and grounding of AI models, facilitating real-world application and better performance.
The DataSeeds.AI program is adaptable, capable of launching on-demand photo challenges to meet specific domain requirements swiftly. This, combined with the human-expert annotation process, ensures the delivery of nuanced, high-fidelity training data that surpasses automated tagging systems.
All data from the DSD, including model weights and research findings, are openly available on platforms like HuggingFace, fostering broader adoption and continued innovation in the AI community.