我要投稿

最大405B：Llama-3.1 发布，第一时间详解

发布日期：2024-07-24 03:17:27 浏览次数： 5421

在这篇文章发出时

Meta 发布了 Llama 3.1

「赛博禅心」第一时间带来详细报道

这次发布

北京时间，2024 年 7 月 23 日 23 点，Meta 正式发布了其最新的开源模型 - Llama 3.1, 包含8B、70B 和 405B 三个尺寸，最大上下文提升到了 128k。

其中，405B 是其迄今最强大的模型，从评分上看，超过了 GPT-4 0125，和 Claude 3.5 不相上下。

趣闻：被偷跑

昨天下午，405B 的 Llama 疑似在 Hugging Face 被偷跑，并在 Twitter 上引起了一小波的轰动（但这个链接已经访问不到了）。更有好事者将其转化成了磁力链接，大概 800G 大小

上个版本是 Llama-3

3个月前，Meta 开源了 Llama 3 的 8B 和 70B 模型。具体可参见之前的报道：全网首发，Meta Llama-3 全方位详解

第一部分

这次发布

开源的Llama 3.1 包含8B、70B 和 405B 三个尺寸，性能提升，最大上下文为 128k。

Llama 3.1

老模型，新升级

之前推出的 8B 和 70B 版本的 Llama-3 迎来了全新升级，变成了 Llama-3.1，上下文长度增加至 128K，并且有了更强的推理能力。

405B 超大杯

405B 是这次的全新发布，非常聪明。和当下最强的 GPT-4 / Claude 3.5 旗鼓相当

全面提升

更多的比照测试如如下

数据训练

巨量数据

Llama 3 使用了超过 15 T token 的公开数据进行训练，使用了超过 1.6 万个 H100 GPU

训练思路（本文附92页PDF）

选择标准的仅解码器 transformer 模型架构进行调整，而不是混合专家模型，以最大化训练稳定性。采用了迭代的后训练程序，每一轮使用监督微调和直接偏好优化。

微调

在后期训练中，Llama 通过进行多轮对齐来生成最终的聊天模型。每一轮都涉及到监督微调（SFT）、拒绝抽样（RS）和直接偏好优化（DPO）。使用合成数据生成绝大部分的 SFT 示例，多次迭代以生成质量更高的合成数据，覆盖所有能力。

开源

官方文档

https://llama.meta.com/docs/overview/

Hugging Face

https://huggingface.co/meta-llama

GitHub

https://github.com/meta-llama

Kaggle

https://www.kaggle.com/organizations/metaresearch/models

第 2 部分

扎克伯格的致辞

伴随这次发布的，是扎克伯格致辞：

Open Source AI Is the Path Forward

开源人工智能是未来的发展方向

与另一篇内容遥相呼应：

李彦宏WAIC圆桌访谈：开源模型是智商税，智能体正在爆发

开源人工智能是未来的发展方向

In the early days of high-performance computing, the major tech companies of the day each invested heavily in developing their own closed source versions of Unix. It was hard to imagine at the time that any other approach could develop such advanced software. Eventually though, open source Linux gained popularity – initially because it allowed developers to modify its code however they wanted and was more affordable, and over time because it became more advanced, more secure, and had a broader ecosystem supporting more capabilities than any closed Unix. Today, Linux is the industry standard foundation for both cloud computing and the operating systems that run most mobile devices – and we all benefit from superior products because of it.

在高性能计算的早期，当时的主要科技公司都大力投资于开发自己的闭源 Unix 版本。当时很难想象其他任何方法能够开发出如此先进的软件。然而最终，开源 Linux 变得流行起来 - 最初是因为它允许开发人员随意修改其代码并且更加经济实惠，随着时间的推移，因为它变得更加先进、更加安全，并且拥有比任何闭源 Unix 更多功能的更广泛生态系统的支持。如今，Linux 是云计算和运行大多数移动设备的操作系统的行业标准基础 - 我们都因此受益于更优质的产品。

I believe that AI will develop in a similar way. Today, several tech companies are developing leading closed models. But open source is quickly closing the gap. Last year, Llama 2 was only comparable to an older generation of models behind the frontier. This year, Llama 3 is competitive with the most advanced models and leading in some areas. Starting next year, we expect future Llama models to become the most advanced in the industry. But even before that, Llama is already leading on openness, modifiability, and cost efficiency.

我相信人工智能会以类似的方式发展。如今，有几家科技公司正在开发领先的封闭模型。但开源很快在缩小差距。去年，Llama 2 只能与边缘之后的旧一代模型相提并论。而今年，Llama 3 在一些领域具有竞争力，甚至在某些方面领先于最先进的模型。从明年开始，我们预计未来的 Llama 模型将成为行业中最先进的。但即使在那之前，Llama 已经在开放性、可修改性和成本效益方面处于领先地位。

Today we’re taking the next steps towards open source AI becoming the industry standard. We’re releasing Llama 3.1 405B, the first frontier-level open source AI model, as well as new and improved Llama 3.1 70B and 8B models. In addition to having significantly better cost/performance relative to closed models, the fact that the 405B model is open will make it the best choice for fine-tuning and distilling smaller models.

今天，我们正在迈出迈向开源人工智能成为行业标准的下一步。我们发布了 Llama 3.1 405B，这是第一个前沿级别的开源人工智能模型，以及新的和改进的 Llama 3.1 70B 和 8B 模型。除了相对于封闭模型具有显着更好的成本/性能之外，405B 模型是开源的事实将使其成为微调和提炼较小模型的最佳选择。

Beyond releasing these models, we’re working with a range of companies to grow the broader ecosystem. Amazon, Databricks, and Nvidia are launching full suites of services to support developers fine-tuning and distilling their own models. Innovators like Groq have built low-latency, low-cost inference serving for all the new models. The models will be available on all major clouds including AWS, Azure, Google, Oracle, and more. Companies like Scale.AI, Dell, Deloitte, and others are ready to help enterprises adopt Llama and train custom models with their own data. As the community grows and more companies develop new services, we can collectively make Llama the industry standard and bring the benefits of AI to everyone.

除了发布这些模型外，我们还与一系列公司合作，以发展更广泛的生态系统。亚马逊、Databricks 和 Nvidia 正在推出一整套服务，以支持开发人员微调和提炼自己的模型。像 Groq 这样的创新者已为所有新模型构建了低延迟、低成本的推理服务。这些模型将在包括 AWS、Azure、Google、Oracle 等在内的所有主要云上提供。像 Scale.AI、戴尔、德勤等公司已准备好帮助企业采用 Llama 并使用自己的数据训练定制模型。随着社区的壮大和更多公司开发新服务，我们可以共同将 Llama 打造成行业标准，并将人工智能的好处带给每个人。

Meta is committed to open source AI. I’ll outline why I believe open source is the best development stack for you, why open sourcing Llama is good for Meta, and why open source AI is good for the world and therefore a platform that will be around for the long term.

Meta 致力于开源人工智能。我将概述为什么我相信开源是最适合您的开发堆栈，为什么开源 Llama 对 Meta 有好处，以及为什么开源人工智能对世界有益，因此是一个长期存在的平台。

Why Open Source AI Is Good for Developers

开源人工智能之于开发者的益处

When I talk to developers, CEOs, and government officials across the world, I usually hear several themes:

当我与世界各地的开发人员、首席执行官和政府官员交谈时，通常会听到一些共同的主题：

We need to train, fine-tune, and distill our own models. Every organization has different needs that are best met with models of different sizes that are trained or fine-tuned with their specific data. On-device tasks and classification tasks require small models, while more complicated tasks require larger models. Now you’ll be able to take the most advanced Llama models, continue training them with your own data and then distill them down to a model of your optimal size – without us or anyone else seeing your data.

我们需要训练、微调和提炼我们自己的模型。每个组织都有不同的需求，最好使用不同尺寸的模型来满足这些需求，这些模型是通过特定数据进行训练或微调的。设备上的任务和分类任务需要小型模型，而更复杂的任务则需要更大的模型。现在，您可以使用最先进的 Llama 模型，继续使用您自己的数据对其进行训练，然后将其提炼为您理想尺寸的模型 - 而无需我们或其他人看到您的数据。

We need to control our own destiny and not get locked into a closed vendor. Many organizations don’t want to depend on models they cannot run and control themselves. They don’t want closed model providers to be able to change their model, alter their terms of use, or even stop serving them entirely. They also don’t want to get locked into a single cloud that has exclusive rights to a model. Open source enables a broad ecosystem of companies with compatible toolchains that you can move between easily.

我们需要掌控自己的命运，不要被困在封闭的供应商中。许多组织不愿意依赖他们无法运行和控制的模型。他们不希望封闭的模型提供商能够改变他们的模型，修改使用条款，甚至完全停止为他们提供服务。他们也不想被锁定在一个拥有模型独家权利的单一云中。开源使得有兼容工具链的广泛公司生态系统成为可能，您可以轻松地在它们之间移动。

We need to protect our data. Many organizations handle sensitive data that they need to secure and can’t send to closed models over cloud APIs. Other organizations simply don’t trust the closed model providers with their data. Open source addresses these issues by enabling you to run the models wherever you want. It is well-accepted that open source software tends to be more secure because it is developed more transparently.

我们需要保护我们的数据。许多组织处理敏感数据，需要保护并且不能将其发送到云 API 上的封闭模型。其他组织简单地不信任封闭模型提供商处理他们的数据。开源通过使您能够在任何地方运行模型来解决这些问题。众所周知，开源软件往往更安全，因为它的开发更加透明。

We need a model that is efficient and affordable to run. Developers can run inference on Llama 3.1 405B on their own infra at roughly 50% the cost of using closed models like GPT-4o, for both user-facing and offline inference tasks.

我们需要一个高效且价格实惠的模型来运行。开发者可以在他们自己的基础设施上运行 Llama 3.1 405B 上的推理，成本大约是使用像 GPT-4o 这样的封闭模型的 50%，适用于用户界面和离线推理任务。

We want to invest in the ecosystem that’s going to be the standard for the long term. Lots of people see that open source is advancing at a faster rate than closed models, and they want to build their systems on the architecture that will give them the greatest advantage long term.

我们希望投资于那些将成为长期标准的生态系统。许多人认为开源发展速度比封闭模型快，他们希望在能够为他们提供最大长期优势的架构上构建自己的系统。

Why Open Source AI Is Good for Meta

为什么开源人工智能对 Meta 有益

Meta’s business model is about building the best experiences and services for people. To do this, we must ensure that we always have access to the best technology, and that we’re not locking into a competitor’s closed ecosystem where they can restrict what we build.

Meta 的商业模式是致力于为人们打造最佳体验和服务。为了实现这一目标，我们必须确保始终能够获得最先进的技术，避免陷入竞争对手的封闭生态系统，他们可能会限制我们的构建。

One of my formative experiences has been building our services constrained by what Apple will let us build on their platforms. Between the way they tax developers, the arbitrary rules they apply, and all the product innovations they block from shipping, it’s clear that Meta and many other companies would be freed up to build much better services for people if we could build the best versions of our products and competitors were not able to constrain what we could build. On a philosophical level, this is a major reason why I believe so strongly in building open ecosystems in AI and AR/VR for the next generation of computing.

我的一次重要经历是在我们的服务受到苹果平台限制时建设。在他们对开发者征税的方式、他们施加的武断规则以及阻止我们推出的所有产品创新之间，很明显，如果我们能够构建我们产品的最佳版本且竞争对手无法限制我们的构建，Meta 和许多其他公司将能够为人们构建更好的服务。从哲学层面上说，这是我坚信在 AI 和 AR/VR 开放生态系统中构建下一代计算的重要原因之一。

People often ask if I’m worried about giving up a technical advantage by open sourcing Llama, but I think this misses the big picture for a few reasons:

人们经常问我是否担心通过开源 Llama 而失去技术优势，但我认为这样做忽略了更重要的一些方面：

First, to ensure that we have access to the best technology and aren’t locked into a closed ecosystem over the long term, Llama needs to develop into a full ecosystem of tools, efficiency improvements, silicon optimizations, and other integrations. If we were the only company using Llama, this ecosystem wouldn’t develop and we’d fare no better than the closed variants of Unix.

Second, I expect AI development will continue to be very competitive, which means that open sourcing any given model isn’t giving away a massive advantage over the next best models at that point in time. The path for Llama to become the industry standard is by being consistently competitive, efficient, and open generation after generation.

其次，我预计人工智能的发展将继续保持竞争激烈，这意味着在某一特定时间点开源任何模型并不会给予比下一个最佳模型更大的优势。Llama 要成为行业标准，关键在于一代又一代地保持竞争力、高效性和开放性。

Third, a key difference between Meta and closed model providers is that selling access to AI models isn’t our business model. That means openly releasing Llama doesn’t undercut our revenue, sustainability, or ability to invest in research like it does for closed providers. (This is one reason several closed providers consistently lobby governments against open source.)

第三，Meta 和封闭模型提供商之间的一个关键区别是，出售 AI 模型的访问权限并不是我们的商业模式。这意味着公开发布 Llama 并不会损害我们的收入、可持续性或研究投资能力，就像对封闭提供商那样。（这也是几家封闭提供商一直在游说政府反对开源的原因之一。）

Finally, Meta has a long history of open source projects and successes. We’ve saved billions of dollars by releasing our server, network, and data center designs with Open Compute Project and having supply chains standardize on our designs. We benefited from the ecosystem’s innovations by open sourcing leading tools like PyTorch, React, and many more tools. This approach has consistently worked for us when we stick with it over the long term.

最后，Meta 拥有悠久的开源项目和成功历史。通过与 Open Compute Project 共享我们的服务器、网络和数据中心设计，并让供应链标准化我们的设计，我们节省了数十亿美元。我们通过开源领先工具如 PyTorch、React 等受益于生态系统的创新。长期坚持这种方法对我们一直有效。

Why Open Source AI Is Good for the World

开源人工智能之于世界的益处

I believe that open source is necessary for a positive AI future. AI has more potential than any other modern technology to increase human productivity, creativity, and quality of life – and to accelerate economic growth while unlocking progress in medical and scientific research. Open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn’t concentrated in the hands of a small number of companies, and that the technology can be deployed more evenly and safely across society.

我相信开源对于积极的人工智能未来是必要的。人工智能拥有比任何其他现代技术更大的潜力，可以提高人类的生产力、创造力和生活质量，加速经济增长，同时在医学和科学研究领域推动进展。开源将确保全球更多人能够分享人工智能的好处和机会，避免权力过度集中在少数公司手中，同时可以更均衡、更安全地在社会各个领域推广这项技术。

There is an ongoing debate about the safety of open source AI models, and my view is that open source AI will be safer than the alternatives. I think governments will conclude it’s in their interest to support open source because it will make the world more prosperous and safer.

目前存在关于开源 AI 模型安全性的辩论，我认为开源 AI 将比其他选择更安全。我认为各国政府会得出结论，支持开源符合他们的利益，因为这将使世界更加繁荣和安全。

My framework for understanding safety is that we need to protect against two categories of harm: unintentional and intentional. Unintentional harm is when an AI system may cause harm even when it was not the intent of those running it to do so. For example, modern AI models may inadvertently give bad health advice. Or, in more futuristic scenarios, some worry that models may unintentionally self-replicate or hyper-optimize goals to the detriment of humanity. Intentional harm is when a bad actor uses an AI model with the goal of causing harm.

我对安全的理解框架是，我们需要保护免受两类伤害：无意和有意。无意伤害是指当人工智能系统可能造成伤害，即使运行它的人并非有意这样做。例如，现代人工智能模型可能无意中给出错误的健康建议。或者，在更具未来感的场景中，一些人担心模型可能无意中自我复制或过度优化目标，对人类造成损害。有意伤害是指恶意使用人工智能模型的坏人以造成伤害。

It’s worth noting that unintentional harm covers the majority of concerns people have around AI – ranging from what influence AI systems will have on the billions of people who will use them to most of the truly catastrophic science fiction scenarios for humanity. On this front, open source should be significantly safer since the systems are more transparent and can be widely scrutinized. Historically, open source software has been more secure for this reason. Similarly, using Llama with its safety systems like Llama Guard will likely be safer and more secure than closed models. For this reason, most conversations around open source AI safety focus on intentional harm.

值得注意的是，大多数人对人工智能的担忧主要集中在无意造成的伤害上 - 从 AI 系统对将使用它们的数十亿人的影响到人类大部分真正灾难性的科幻场景。在这方面，开源应该更安全，因为这些系统更加透明，可以被广泛审查。从历史上看，出于这个原因，开源软件更安全。同样，使用带有 Llama Guard 等安全系统的 Llama 可能比封闭模型更安全、更可靠。因此，大多数关于开源 AI 安全的讨论都集中在有意造成的伤害上。

Our safety process includes rigorous testing and red-teaming to assess whether our models are capable of meaningful harm, with the goal of mitigating risks before release. Since the models are open, anyone is capable of testing for themselves as well. We must keep in mind that these models are trained by information that’s already on the internet, so the starting point when considering harm should be whether a model can facilitate more harm than information that can quickly be retrieved from Google or other search results.

我们的安全流程包括严格测试和红队评估，以评估我们的模型是否有造成实质性危害的能力，目标是在发布之前减轻风险。由于这些模型是开放的，任何人都可以自行测试。我们必须记住，这些模型是通过已经在互联网上的信息进行训练的，因此在考虑危害时的起点应该是模型是否能比可以从谷歌或其他搜索结果中快速获取的信息带来更多危害。

When reasoning about intentional harm, it’s helpful to distinguish between what individual or small scale actors may be able to do as opposed to what large scale actors like nation states with vast resources may be able to do.

在思考有意图的伤害时，有助于区分个人或小规模行为者可能采取的行动，与拥有庞大资源的国家等大规模行为者可能采取的行动。

At some point in the future, individual bad actors may be able to use the intelligence of AI models to fabricate entirely new harms from the information available on the internet. At this point, the balance of power will be critical to AI safety. I think it will be better to live in a world where AI is widely deployed so that larger actors can check the power of smaller bad actors. This is how we’ve managed security on our social networks – our more robust AI systems identify and stop threats from less sophisticated actors who often use smaller scale AI systems. More broadly, larger institutions deploying AI at scale will promote security and stability across society. As long as everyone has access to similar generations of models – which open source promotes – then governments and institutions with more compute resources will be able to check bad actors with less compute.

在未来的某个时候，个别不良分子可能会利用人工智能模型的智能，从互联网上可获得的信息中制造全新的危害。在这一点上，权力的平衡对人工智能安全至关重要。我认为生活在一个人工智能广泛部署的世界会更好，这样更大的行动者可以制约较小的不良分子的权力。这就是我们在社交网络上管理安全的方式——我们更强大的人工智能系统识别并阻止那些经常使用较小规模人工智能系统的不那么复杂的行动者的威胁。更广泛地说，大型机构大规模部署人工智能将促进社会的安全和稳定。只要每个人都能访问相似世代的模型 - 这正是开源所倡导的 - 那么拥有更多计算资源的政府和机构就能够用更少的计算资源来审查不良行为者。

The next question is how the US and democratic nations should handle the threat of states with massive resources like China. The United States’ advantage is decentralized and open innovation. Some people argue that we must close our models to prevent China from gaining access to them, but my view is that this will not work and will only disadvantage the US and its allies. Our adversaries are great at espionage, stealing models that fit on a thumb drive is relatively easy, and most tech companies are far from operating in a way that would make this more difficult. It seems most likely that a world of only closed models results in a small number of big companies plus our geopolitical adversaries having access to leading models, while startups, universities, and small businesses miss out on opportunities. Plus, constraining American innovation to closed development increases the chance that we don’t lead at all. Instead, I think our best strategy is to build a robust open ecosystem and have our leading companies work closely with our government and allies to ensure they can best take advantage of the latest advances and achieve a sustainable first-mover advantage over the long term.

美国和民主国家应该如何应对像中国这样拥有大量资源的国家的威胁是下一个问题。美国的优势在于分散和开放的创新。有人认为我们必须关闭我们的模式，以防止中国获得对它们的访问，但我认为这不会奏效，只会给美国及其盟友带来不利。我们的对手擅长间谍活动，窃取适合放在一个拇指驱动器上的模式相对容易，而大多数科技公司远未以使这更加困难的方式运作。似乎最有可能的情况是，只有封闭模型的世界会导致少数几家大公司以及我们的地缘政治对手能够访问领先的模型，而初创公司、大学和小型企业则错失机会。此外，将美国的创新限制在封闭开发中会增加我们根本无法领先的可能性。相反，我认为我们最好的策略是建立一个强大的开放生态系统，并让我们领先的公司与我们的政府和盟友密切合作，确保他们能够最好地利用最新进展，并在长期内取得可持续的先发优势。

When you consider the opportunities ahead, remember that most of today’s leading tech companies and scientific research are built on open source software. The next generation of companies and research will use open source AI if we collectively invest in it. That includes startups just getting off the ground as well as people in universities and countries that may not have the resources to develop their own state-of-the-art AI from scratch.

在考虑未来的机遇时，请记住，今天大多数领先的科技公司和科学研究都是建立在开源软件的基础上的。如果我们共同投资于开源人工智能，下一代公司和研究将使用开源人工智能。这包括刚刚起步的初创公司，以及那些可能没有资源从头开始开发自己最先进人工智能的大学和国家的人。

The bottom line is that open source AI represents the world’s best shot at harnessing this technology to create the greatest economic opportunity and security for everyone.

开源人工智能代表着世界最好的机会，利用这项技术创造最大的经济机会和安全保障。

Let’s Build This Together

让我们一起建设这个项目

With past Llama models, Meta developed them for ourselves and then released them, but didn’t focus much on building a broader ecosystem. We’re taking a different approach with this release. We’re building teams internally to enable as many developers and partners as possible to use Llama, and we’re actively building partnerships so that more companies in the ecosystem can offer unique functionality to their customers as well.

在过去的羊驼模型中，Meta 为我们自己开发了它们，然后发布，但并没有过多关注构建更广泛的生态系统。这次发布我们采取了不同的方式。我们正在内部建立团队，以使尽可能多的开发人员和合作伙伴使用羊驼，并积极建立合作关系，以便生态系统中更多公司也能为其客户提供独特功能。

I believe the Llama 3.1 release will be an inflection point in the industry where most developers begin to primarily use open source, and I expect that approach to only grow from here. I hope you’ll join us on this journey to bring the benefits of AI to everyone in the world.

我相信 Llama 3.1 版本将成为行业的一个转折点，大多数开发人员将开始主要使用开源，我期待这种方法从这里开始不断增长。希望您能加入我们，一起努力将人工智能的好处带给世界上的每个人。