Back    Zoom +    Zoom -
Guosen Securities: DeepSeek's Multi-layer Tech Enhances Training Efficiency; Test Performance Outruns Open-source Models
Recommend
56
Positive
84
Negative
35
DeepSeek, a Chinese AI company, launched and opened sources for DeepSeek-V3 model on 26 December 2024, which outperformed similar open-source models in a number of tests, and was comparable to the top closed-source models in important areas, with low training costs, according to a research report issued by Guosen Securities.

The model layer adopts the MoE framework, and outperforms open-source models in knowledge, code, mathematical reasoning and other tests after multi-stage training and capability refinement.

Related NewsCLSA Forecasts Alibaba (BABA.US) Last Qtr Rev. Growth to Improve to 5.5% YoY, Rating Kept Outperform
The architecture layer follows the V2 architecture and introduces new technologies, such as unaided loss-free balancing strategy and MTP to improve data utilization. The training layer realizes cost control and efficiency improvement through DualPipe algorithm and FP8 hybrid-precision training processor.
AAStocks Financial News