State of GPT | BRK216HFS

Andrej Karpathy @ Microsoft Developer, 2023

Extrait du texte en description de la vidéo sur la chaîne YouTube Microsoft Developper :

Learn about the training pipeline of GPT assistants like ChatGPT, from tokenization to pretraining, supervised finetuning, and Reinforcement Learning from Human Feedback (RLHF). Dive deeper into practical techniques and mental models for the effective use of these models, including prompting strategies, finetuning, the rapidly growing ecosystem of tools, and their future extensions.

Speaker: Andrej Karpathy

Session Information: This video is one of many sessions delivered for the Microsoft Build 2023 event.

Points particuliers abordés dans la vidéo :

I. Comment entraîner un assistant GPT

Processus d’entraînement des modèles de langages, principalement GPT, en 4 étapes qui se suivent dans cet ordre :

1. Pretraining > Base model

2. Supervised Finetuning > SFT model

3. Reward Modeling > RM model

4. Reinforcement Learning > RL model

II. Comment utiliser un assistant GPT dans l'application que l'on souhaite développer

Cas concrets

Sources principales citées sur les slides de la présentation :

Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language Models are Few-Shot Learners [Internet]. arXiv; 2020 [cité 25 mai 2024]. Disponible sur: http://arxiv.org/abs/2005.14165

Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2303.12712

Jang E. Can LLMs Critique and Iterate on Their Own Outputs? [Internet]. Eric Jang. 2023 [cité 5 juin 2023]. Disponible sur: https://evjang.com/2023/03/26/self-reflection.html

Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large Language Models are Zero-Shot Reasoners [Internet]. arXiv; 2023 [cité 8 févr 2024]. Disponible sur: http://arxiv.org/abs/2205.11916

Köpf A, Kilcher Y, von Rütte D, Anagnostidis S, Tam ZR, Stevens K, et al. OpenAssistant Conversations -- Democratizing Large Language Model Alignment [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2304.07327

LMSYS Org. Chatbot Arena Leaderboard Updates (Week 2) [Internet]. LMSYS Org. 2023 [cité 5 juin 2023]. Disponible sur: https://lmsys.org/blog/2023-05-10-leaderboard

Radford A, Narasimhan K, Salimans T, Sutskever I. Improving Language Understanding by Generative Pre-Training. 2018 [cité 25 mai 2023]; Disponible sur: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language Models are Unsupervised Multitask Learners. 2019 [cité 25 mai 2023]; Disponible sur: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

Schick T, Dwivedi-Yu J, Dessì R, Raileanu R, Lomeli M, Zettlemoyer L, et al. Toolformer: Language Models Can Teach Themselves to Use Tools [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2302.04761

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal Policy Optimization Algorithms [Internet]. arXiv; 2017 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/1707.06347

Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of Go without human knowledge. Nature. 1 oct 2017;550:354‑9. Wang X, Wei J, Schuurmans D, Le Q, Chi E, Narang S, et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2203.11171

Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In: 36th Conference on Neural Information Processing Systems [Internet]. New Orleans; 2022 [cité 25 mai 2023]. Disponible sur: https://arxiv.org/pdf/2201.11903

Yao S, Yu D, Zhao J, Shafran I, Griffiths TL, Cao Y, et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. In: 37th Conference on Neural Information Processing Systems [Internet]. New Orleans; 2023a [cité 25 mai 2023]. Disponible sur: https://arxiv.org/pdf/2305.10601

Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan K, et al. ReAct: Synergizing Reasoning and Acting in Language Models [Internet]. arXiv; 2023b [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2210.03629

Zhou Y, Muresanu AI, Han Z, Paster K, Pitis S, Chan H, et al. Large Language Models Are Human-Level Prompt Engineers [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2211.01910

Référence bibliographique et accès à la vidéo : State of GPT | BRK216HFS. 2023 [cité 2 juin 2023]. Disponible sur : https://www.youtube.com/watch?v=bZQun8Y4L2A

Connexe

Powerpoint de la présentation au format PDF disponible sur le site web d'Andrej Karpathy :

Andrej Karpathy. State of GPT @ Microsoft Build 2023 (slides) [en ligne] [cité 12 juil. 2023]. Disponible sur : https://karpathy.ai/stateofgpt.pdf

in Raison

# AI AI Assistants Base model ChatGPT Conference Dev GPT IA Intelligence Artificielle LLM Microsoft Modèles de langage Pretraining Reinforcement Learning Reward Modeling Supervised Finetuning Vidéo

Partager cet article

Étiquettes

AI AI Assistants Base model ChatGPT Conference Dev GPT IA Intelligence Artificielle LLM Microsoft Modèles de langage Pretraining Reinforcement Learning Reward Modeling Supervised Finetuning Vidéo

State of GPT | BRK216HFS

Connexe

Partager cet article

Étiquettes

Archives

Suivez-nous