• BuzzRobot
  • Posts
  • Optimizing Distributed Training for LLMs by Oak Ridge National Lab: Upcoming Talk

Optimizing Distributed Training for LLMs by Oak Ridge National Lab: Upcoming Talk

Plus: Advancements in robot systems deployment – upcoming talk. Top AI researchers on the future of AI systems – video recording. How GNNs are changing weather prediction – written summary.

Hello, fellow human! Sophia here. We've put together a couple of talks in the next few weeks. Please check them out – I hope you can join one of them.

Table of Contents

Upcoming Talk (March 21st): Optimizing Distributed Training for LLMs by Oak Ridge National Laboratory.

Upcoming Talk (March 28th): Recent Advancements in Robot Systems Deployment by Stanford AI Lab.

Video Recording: Top AI Researchers Give Their Prediction on How AI Systems Will Develop Further.

Written Summary: How Graph Neural Networks-Based Models Change Weather Prediction Traditions by Google DeepMind.

Upcoming Talk: Optimizing Distributed Training for LLMs by Oak Ridge National Laboratory.

We are hosting a BuzzRobot virtual talk on March 21st with Sajal Dash, a Research Scientist from Oak Ridge National Laboratory. As you know, training LLMs with billions of parameters introduces serious challenges and demands enormous computational resources. Training a one trillion-parameter GPT-style model on 20 trillion tokens requires 120 million exaflops of computation.

In this talk, Sajal will explore distributed training strategies to extract this computation from Frontier, the exascale supercomputer, and will share insights about certain parallel training techniques that would allow for efficient training of a trillion-parameter model.

Upcoming Talk: Recent Advancements in Robot Systems Deployment by Stanford AI Lab.

On March 28th, we have another virtual talk presented by Zipeng Fu from Stanford AI Lab. Zipeng will guide us through recent advancements in the deployment of robot learning systems, with an emphasis on their scalability and deployability to open-world problems. He'll cover two main paradigms of learning-based methods for robotics: reinforcement learning and imitation learning, based on his recent research work, including Mobile ALOHA and Robot Parkour Learning, among others.

Video Recording: Top AI Researchers Give Their Prediction on How AI Systems Will Evolve Further.

We recently hosted a talk by Fabienne Sandkühler from the University of Bonn, a co-author of the survey "Thousands of AI Authors on the Future of AI". Her collaborators surveyed almost 3,000 AI researchers to gather their predictions on how AI systems will further evolve and, let's be straightforward, when they expect AI to completely automate humans or cause human extinction.

50% of researchers expect full automation by 2116 – basically, our grandkids will have to deal with that. Interestingly, in the 2022 survey, the answer to the same question was 2164. The timeline has shortened by almost 50 years.

Around 10% of respondents believe that human inability to control AI systems will cause human extinction. Personally, I can't imagine how it would even be possible to control something that is way more intelligent than all humans combined. What are your thoughts on that?

The full video is on the BuzzRobot YouTube channel with more predictions and concerns from AI researchers on the further development of AI.

A Short Summary of the talk: Ferran Alet, a Research Scientist at Google DeepMind, on Graph Neural Networks (GNNs) for More Powerful Weather Prediction.

Introducing GraphCast: A GNN-Based Forecasting System That Beats Physics Based Models

The talk highlights GraphCast, a weather forecasting system built using GNNs. GraphCast predicts over 200 variables with 40 steps ahead which comes out to about 35GB  – it's like predicting a movie. The Google DeepMind team believes the model has the largest output examples ever used in Machine Learning.

In a nutshell, GraphCast is a learned simulator based on Graph neural networks (GNNs) with encoded inductive biases like auto-regressive. The idea is the following: the physics of today is the same as physics of tomorrow which means you only need to learn a function that from the weather of today it predicts the weather six hours from now. And if you want to predict 10 days ahead, just apply the function multiple times. 

So what does that function consist of?

  1. The encoder maps input to a multi-mesh. This approach helps save computation, and another benefit is that the mesh is uniformly distributed across the Earth, unlike other approaches.

  2. Processor, which propagates latent states.

  3. Decoder, which decodes information from the mesh and maps back to the state space, thus making predictions on every place on Earth.

GraphCast is a really large GNN model with 41k nodes and 328k edges. GraphCast showed much better results than top traditional physics-based models, showing a 15% improvement on 1 variable, but there are 2760 variables. GraphCast showed improvement on 90% of those variables compared to traditional models. Interestingly, the updated version of the model demonstrated a 97% improvement compared to traditional models.

Can Chaos Be Predicted? Or Meet GenCast, a Generative Model for Ensemble Weather Forecasting

While GraphCast is based on deterministic forecasts, the talk also covers GenCast, a model for probabilistic weather forecasting. The model predicts different variations of the future or how Ferran called it – ensemble forecasting with let’s say 50 samples of what weather could look like in the future

The key property of GenCast – it’s a conditional diffusion model where the condition is based on current and past weather to predict the future weather state. Considering that weather is a chaotic system, models like GenCast that predict the range of future outcomes, are well suited for this kind of system. Interestingly, with GenCast researchers could achieve a longer prediction timeline: 10 days by GraphCast vs 15 days prediction by GenCast.

The Impact of GNNs Approach Forecasting on The Industry

The GraphCast model was applied to important scientific problems and showed better results compared to physics-based models. Just a few examples: 

  • Cyclone tracks extracted from GraphCast’s forecasts were evaluated against the IBTrACS dataset, a global collection of tropical cyclones, over 4 years of data. GraphCast showed 9 hours of accuracy over physics models. 

  • Real-time GraphCast correctly predicted 9 days in advance that landfall will occur in Nova Scotia. Traditional approaches can predict only 6 days ahead. 

Ferran’s and his team’s work has changed how weather prediction is done. It was presented to the European Weather Agency who decided to adopt GraphCast’s approach. They changed their clusters to GPU-based (before that they used only CPU-based clusters) and started hiring machine learning engineers to improve these models.

Reply

or to participate.