Paper: https://arxiv.org/pdf/2411.14405) Github: https://github.com/AIDC-AI/Marco-o1)
The Alibaba MarcoPolo team introduces Marco-o1, a large reasoning model designed to excel in open-ended problem-solving, unlike previous models which primarily focused on tasks with readily available answers. Marco-o1 uses Chain-of-Thought fine-tuning, Monte Carlo Tree Search (MCTS), and innovative reasoning strategies to improve accuracy. The model's performance is enhanced by multiple datasets and a novel reflection mechanism that allows the model to self-critique its work. Experiments show significant accuracy improvements on benchmark datasets and superior performance in translating nuanced language. Future work involves improving the MCTS reward system and applying reinforcement learning techniques.
ai , llm , alibaba , artificial intelligence , arxiv , research , paper , publication , genai , generativeai, agentic