Guosen Securities: DeepSeek's Multi-layer Tech Enhances Training Efficiency; Test Performance Outruns Open-source ModelsFinancial News

Latest Search

Back Zoom + Zoom -

Guosen Securities: DeepSeek's Multi-layer Tech Enhances Training Efficiency; Test Performance Outruns Open-source Models

Recommend

Positive

Negative

DeepSeek, a Chinese AI company, launched and opened sources for DeepSeek-V3 model on 26 December 2024, which outperformed similar open-source models in a number of tests, and was comparable to the top closed-source models in important areas, with low training costs, according to a research report issued by Guosen Securities.

The model layer adopts the MoE framework, and outperforms open-source models in knowledge, code, mathematical reasoning and other tests after multi-stage training and capability refinement.

Related NewsCLSA Forecasts Alibaba (BABA.US) Last Qtr Rev. Growth to Improve to 5.5% YoY, Rating Kept Outperform
The architecture layer follows the V2 architecture and introduces new technologies, such as unaided loss-free balancing strategy and MTP to improve data utilization. The training layer realizes cost control and efficiency improvement through DualPipe algorithm and FP8 hybrid-precision training processor.

AAStocks Financial News

Copied to Clipboard

Remark

(1) HK Indices are real time

View: Mobile|Desktop
Lang: 繁|简|EN