世界中の多様な研究者や開発者が相互接続されたデジタル画面でAIモデルを共同作業している様子。AIの民主化を象徴している。 プロンプト (画像生成用・英語): A diverse group of researchers and developers from around the world collaborating on AI models on interconnected digital screens, symbolizing the democratization of AI. Wide angle, 16:9 aspect ratio.
光るサーバーと協調的に動作するAIモデルのネットワークを持つ未来のデータセンター。分散型AIトレーニングとリソース効率を象徴している。 プロンプト (画像生成用・英語): A futuristic data center with glowing servers and a collaborative network of AI models interacting, symbolizing distributed AI training and resource efficiency. Wide angle, 16:9 aspect ratio.
8ビット浮動小数点(FP8)データ構造の詳細なイラスト。より大きな16ビットまたは32ビット構造と対比させ、データ圧縮と効率を視覚的に表現している。 プロンプト (画像生成用・英語): A detailed illustration of an 8-bit floating-point (FP8) data structure, contrasting with a larger 16-bit or 32-bit structure, to visually represent data compression and efficiency. Wide angle, 16:9 aspect ratio.
分割された画面。片側にはエネルギー消費の大きい大規模なGPUファーム。もう片側には、FP8低ビットトレーニングによる、より少ないハードウェアで効率的なAIトレーニング環境。 プロンプト (画像生成用・英語): A split screen. On one side, a traditional massive GPU farm with high energy consumption. On the other side, a streamlined, efficient AI training setup with much less hardware, representing FP8 low-bit training. Wide angle, 16:9 aspect ratio.
医療従事者がパーソナライズされた治療計画のためにAIシステムを使用している様子。Co-GenAIフレームワークの医療分野での具体的な応用を示している。 プロンプト (画像生成用・英語): A medical professional using an AI system for personalized treatment planning, showing specific applications of the Co-GenAI framework in healthcare. Wide angle, 16:9 aspect ratio.
PolyU’s Game-Changing Co-GenAI & FP8 Solutions: How Hong Kong is Democratizing AI Training with 48% Faster Speeds!
The landscape of Artificial Intelligence is evolving at an unprecedented pace, driven by relentless innovation from academic institutions and industry leaders alike. In a significant move that promises to redefine accessibility and efficiency in AI development, the Hong Kong Polytechnic University (PolyU) has recently open-sourced its groundbreaking Co-GenAI framework and an end-to-end FP8 low-bit training solution. These developments are not just incremental improvements; they represent a paradigm shift towards more inclusive, efficient, and privacy-preserving generative AI (GenAI) research and deployment globally.
Democratizing AI with the Co-GenAI Framework
At the heart of PolyU’s announcement is the Co-GenAI framework, a novel collaborative training paradigm developed by the PolyU Academy for Artificial Intelligence (PAAI). This framework fundamentally transforms the conventional centralized model training approach into a decentralized one. For years, the prohibitive computational costs associated with training large foundation models have effectively limited their development to a handful of well-resourced organizations, creating an “AI rich game” as Professor Yang Hongxia, Executive Director of PAAI, aptly described it.
分散型AIトレーニングとリソース効率を象徴する、協調的に動作するAIモデルの未来のデータセンター。 プロンプト (画像生成用・英語): A futuristic data center with glowing servers and a collaborative network of AI models interacting, symbolizing distributed AI training and resource efficiency. Wide angle, 16:9 aspect ratio.
The Co-GenAI framework directly addresses this critical challenge by drastically reducing resource requirements, protecting invaluable data privacy, and dismantling barriers such as GPU monopolies. This fosters a far more inclusive environment, empowering global institutions, researchers, and developers to actively participate in cutting-edge AI research without the astronomical overheads previously required.
Key innovations within Co-GenAI include ultra-low-resource training and decentralized model fusion. This framework tackles persistent issues such as data silos due to privacy and copyright concerns, particularly prevalent in sensitive domains like healthcare and finance. Furthermore, it overcomes the static nature of traditional foundation models, which often require enormous resources for retraining to adapt to new knowledge.
PolyU’s “InfiFusion” model fusion technology stands out as a testament to this collaborative approach. It can merge multiple state-of-the-art models in just hundreds of GPU hours, a stark contrast to the millions of GPU hours typically required to train such models from scratch. This not only delivers fused models that significantly outperform their originals across key benchmarks but also paves the way for a more agile and efficient development cycle. Significantly, PolyU has also provided the first theoretical validation of model fusion, proposing a “Model Merging Scaling Law” which could offer a new pathway towards Artificial General Intelligence (AGI).
Real-World Impact and Applications
医療従事者がAIシステムを用いてパーソナライズされた治療計画を立てる様子。Co-GenAIフレームワークの医療分野での応用を示しています。 プロンプト (画像生成用・英語): A medical professional using an AI system for personalized treatment planning, showing specific applications of the Co-GenAI framework in healthcare. Wide angle, 16:9 aspect ratio.
The practical implications of Co-GenAI are profound. PolyU has already successfully applied its pipelines to domain-specific models, particularly in medical and cancer AI systems. These models have demonstrated best-in-class results in areas such as diagnosis, reasoning, and research agent applications, including complex task handling and report generation. Collaborations with prominent hospitals, including Queen Elizabeth Hospital in Hong Kong and mainland institutions, underscore the real-world applicability and potential to advance personalized treatment and AI-based radiotherapy.
Revolutionizing Efficiency with FP8 Low-Bit Training
Complementing the Co-GenAI framework, PolyU has also made a monumental contribution by open-sourcing an end-to-end FP8 low-bit training solution. This marks PolyU as the first academic institution globally to provide such a comprehensive, open-source solution covering both continual pre-training (CPT) and post-training stages.
FP8 (Floating-Point 8) low-bit training is a crucial advancement in addressing the computational intensity of modern AI models, especially Large Language Models (LLMs). While BF16 (Brain Floating Point 16) has been a standard for efficient neural network training, FP8 introduces even finer-grained numerical formats to balance computational efficiency with numerical stability. By employing lower-precision data formats for variables like gradients and optimizer states, FP8 training significantly reduces computational requirements and memory consumption without compromising model accuracy.
8ビット浮動小数点(FP8)データ構造のイラスト。データ圧縮と効率を視覚的に表現しています。 プロンプト (画像生成用・英語): A detailed illustration of an 8-bit floating-point (FP8) data structure, contrasting with a larger 16-bit or 32-bit structure, to visually represent data compression and efficiency. Wide angle, 16:9 aspect ratio.
Tangible Benefits and Performance Gains
大規模GPUファームとFP8低ビットトレーニングによる効率的なAIトレーニング環境を対比した画像。 プロンプト (画像生成用・英語): A split screen. On one side, a traditional massive GPU farm with high energy consumption. On the other side, a streamlined, efficient AI training setup with much less hardware, representing FP8 low-bit training. Wide angle, 16:9 aspect ratio.
The performance enhancements offered by PolyU’s FP8 solution are striking:
**Speed:** It achieves BF16 precision while training over 20% faster than traditional methods. Specifically, the PolyU method cuts training time by 48% compared to the mainstream BF16 approach, effectively halving the computing power needed for similar results. Some studies on FP8-LM show it can run 64-75% faster than BF16 frameworks.
**Memory Efficiency:** The solution uses 10% less peak memory. It reduces video memory usage by approximately 24% and has shown up to a 39-42% reduction in real memory usage during training of large models like GPT-175B on H100 GPU platforms.
**Reduced Overheads:** The dramatic lowering of training overheads makes advanced AI training more feasible for a broader range of institutions.
The integrated pipeline incorporates continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL), ensuring high-quality output while further shortening overall training times. This holistic approach maintains model performance comparable to major systems, all while significantly cutting resource demands. The team is already looking ahead, actively exploring even lower-cost FP4 precision training, with initial promising results reported in academic publications.
A New Era of Collaborative and Efficient AI
世界中の研究者や開発者がAIモデルで共同作業する様子。AIの民主化を象徴しています。 プロンプト (画像生成用・英語): A diverse group of researchers and developers from around the world collaborating on AI models on interconnected digital screens, symbolizing the democratization of AI. Wide angle, 16:9 aspect ratio.
The combined open-sourcing of the Co-GenAI framework and the FP8 low-bit training solution positions PolyU at the forefront of AI innovation, not just in research but in practical, accessible deployment. As Professor Christopher Chao, Senior Vice President (Research and Innovation) at PolyU, noted, AI is a key driver for new, quality productive forces, and PAAI is accelerating AI integration across industries. This initiative is set to democratize access to cutting-edge AI capabilities, fostering a future where advanced GenAI research is no longer an exclusive domain but a collaborative global endeavor. These breakthroughs will undoubtedly inspire a new generation of AI applications and accelerate the pace of innovation worldwide, solidifying Hong Kong’s position as a global hub for GenAI research.