Trillion Parameter Consortium (TPC)

The overarching focus of the consortium is to bring together groups interested in building, training, and using large-scale models with those who are building and operating large-scale computing systems. The target community encompasses (a) those working on AI methods development, natural language processing/multimodal approaches and architectures, full stack implementations, scalable libraries and frameworks, AI workflows, data aggregation, cleaning and organization, training runtimes, model evaluation, downstream adaptation, alignment, etc.; (b) those that design and build hardware and software systems; and (c) those that will ultimately use the resulting AI systems to attack a range of problems in science, engineering, medicine, and other domains.

Click here for more information.

TPC Sydney Program Committee: Jingbo Wang (NCI), Rio Yokota (TiTech), Arvind Ramanathan (Argonne), Prasanna Balaprakash (ORNL), Neeraj Kumar (PNNL), Valerie Taylor (Argonne), Tong Xie (UNSW), Ian Foster (Argonne/UChicago), and Ravi Madduri(Argonne).

Agenda

Monday 19th February (9am – 5pm)

Introduction to AI for Science

Session 1 – Introduction to AI for Science
Session 2 – Foundation Models in AI
Session 3 – Using Pre-Trained Models

Adapting and Fine-Tuning Models for Science

Session 4 – Adapting Models for Scientific Data
Session 5 – Hands-on Workshop
Wrap-up and Q&A
TPC Networking and Informal Discussion

Tuesday 20th February  (9am – 12pm)

Meeting room C3.2

LLM’s for Science Hackathon
The emergence of foundation models has marked a pivotal shift, offering new horizons for the development of AI-enabled scientific tools across diverse fields such as chemistry, materials science, biology, and climate science. This hackathon is dedicated to exploring the significant impact of foundation models, particularly large language models (LLMs), in the analysis and interpretation of scientific data. The hackathon aims to deeply acquaint participants with the processes of selecting, developing, and fine-tuning foundation models to address the distinct challenges faced in scientific exploration. During the event, attendees will be introduced to an array of potential projects that demonstrate the vast potential of LLMs in scientific contexts. This will serve as a platform for participants to present their innovative project ideas, encouraging a dynamic and cooperative atmosphere. The hackathon will support the formation of teams and the selection of projects, emphasizing the synergy of diverse skills and interests to cultivate effective collaboration. As teams dive into the exploration and practical experimentation with LLMs, they will engage in intensive discussion and coding sessions. These sessions are geared towards optimizing LLMs for use with scientific datasets across various fields, utilizing Jupyter Notebooks, Github as a pivotal tool. By participating in this hackathon, individuals will gain invaluable experience in leveraging cutting-edge AI technologies for scientific inquiry. They will emerge with the competencies and insights necessary to employ foundation models in pushing the boundaries of scientific research, thereby laying the groundwork for future AI-driven innovations in science.

Agenda

  • 9:00-9:15: welcome and recap of Monday tutorial (Arvind/Venkat/Neeraj)
  • 9:15-10:00: hackathon theme introduction & project pitches (Neeraj/Venkat)
  • 10:00-10:30: team formation & project selection
  • Based on participant interests and skillsets, help them form teams and solidify project ideas.
  • Guided sessions on key aspects relevant to scientific LLM applications:
  • 10:30-12:00: dive into LLM exploration & experimentation
  • Hands-on coding session

Wednesday 21st February (9am – 12pm)

TPC Group Discussions
9am – 12pm
There will be three morning TPC workshop tracks which focus on scientific data for training large-scale AI models, model architecture, performance evaluation, and federated learning.

TPC Track A: DATA for AI and LLM for Science

Meeting room C2.4

Neeraj Kumar (PNNL), Ian Foster (UChicago/Argonne), Kyle Lo (AI2)

In the era of exponential data growth, this session addresses critical challenges and innovative strategies in harnessing vast datasets needed for training large language models (LLMs) in domains such as chemistry/materials, biology, climate science, and high energy physics. This session will discuss the complexities of developing a data-focused infrastructure, including streamlined data curation pipelines, the refinement of data curation practices, and the application of pre-training methodologies.  It will explore how the incorporation of domain-specific knowledge into these processes can significantly enhance the performance and applicability of LLMs in scientific research, emphasizing the critical role of targeted data selection and preparation in advancing AI capabilities. Through discussions, lightning talks, and collaborative dialogues, participants will explore cutting-edge approaches to optimize data utility and model effectiveness, setting a foundation for breakthroughs in scientific discovery.

TPC Track B: ARCHITECTURE for AI and LLMs for Science

Meeting room C2.5

Rio Yokota (TITech), Jens Domke (RIKEN)

Architectures for LLMs are continuously evolving. Variants of transformers, their mixture-of-experts-based extensions, and state-space models are being proposed on a weekly basis. Frameworks such as Megatron-LM and DeepSpeed, and their various forks each cover a different subset of architectures, distributed parallelism, and compute/memory/IO optimizations. Determining the optimal architecture for training a trillion parameter model on scientific data, and the best framework to accomplish this on today’s Exascale platforms is critical for the creation of a new class of AI models for a broad range of scientific applications.

TPC Track C: FEDERATED LEARNING for AI and LLMs for Science

Meeting room C2.6

Ravi Madduri (Argonne), Mohamed Wahib (RIKEN)

This session will provide insights into infrastructure requirements, collaborative agreements, and other considerations required to train LLMs for science by leveraging Federated Learning. Federated learning (FL) is a collaborative learning approach where multiple data owners, referred to as clients, collectively train a model under the orchestration of a central server. In this approach, clients share the model trained on their local datasets without the need for sharing the data itself. FL enables the creation of more robust models without the exposure of local datasets and the associated data sharing and protection arrangements. Several challenges remain in adopting FL for training LLMs for science. These challenges range from heterogeneous resources at different HPC centers, non-standardized scheduling of training processes, and resource allocation mechanisms. In this session we will discuss and evaluate different federation strategies and leveraging Federated Learning for fine-tuning of LLMs for science.

BoF — Trillion Parameter Consortium

Meeting room C3.3, 1:30pm – 3pm

As part of the SCA2024 program, Professor Rick L. Stevens will give a keynote talk, followed by a group discussion.