Latest Search
Quote
Back Zoom + Zoom - | |
Guosen Securities: DeepSeek's Multi-layer Tech Enhances Training Efficiency; Test Performance Outruns Open-source Models
Recommend 56 Positive 84 Negative 35 |
|
DeepSeek, a Chinese AI company, launched and opened sources for DeepSeek-V3 model on 26 December 2024, which outperformed similar open-source models in a number of tests, and was comparable to the top closed-source models in important areas, with low training costs, according to a research report issued by Guosen Securities. The model layer adopts the MoE framework, and outperforms open-source models in knowledge, code, mathematical reasoning and other tests after multi-stage training and capability refinement. The architecture layer follows the V2 architecture and introduces new technologies, such as unaided loss-free balancing strategy and MTP to improve data utilization. The training layer realizes cost control and efficiency improvement through DualPipe algorithm and FP8 hybrid-precision training processor. AAStocks Financial News |
|