Extrait du texte en description de la vidéo sur la chaîne YouTube Microsoft Developper :
Learn about the training pipeline of GPT assistants like ChatGPT, from tokenization to pretraining, supervised finetuning, and Reinforcement Learning from Human Feedback (RLHF). Dive deeper into practical techniques and mental models for the effective use of these models, including prompting strategies, finetuning, the rapidly growing ecosystem of tools, and their future extensions.
Speaker: Andrej Karpathy
Session Information: This video is one of many sessions delivered for the Microsoft Build 2023 event.
Processus d’entraînement des modèles de langages, principalement GPT, en 4 étapes qui se suivent dans cet ordre :
1. Pretraining > Base model
2. Supervised Finetuning > SFT model
3. Reward Modeling > RM model
4. Reinforcement Learning > RL model
II. Comment utiliser un assistant GPT dans l'application que l'on souhaite développer
Cas concrets
Sources principales citées sur les slides de la présentation :
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language Models are Few-Shot Learners [Internet]. arXiv; 2020 [cité 25 mai 2024]. Disponible sur: http://arxiv.org/abs/2005.14165
Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2303.12712
Jang E. Can LLMs Critique and Iterate on Their Own Outputs? [Internet]. Eric Jang. 2023 [cité 5 juin 2023]. Disponible sur: https://evjang.com/2023/03/26/self-reflection.html
Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large Language Models are Zero-Shot Reasoners [Internet]. arXiv; 2023 [cité 8 févr 2024]. Disponible sur: http://arxiv.org/abs/2205.11916
Köpf A, Kilcher Y, von Rütte D, Anagnostidis S, Tam ZR, Stevens K, et al. OpenAssistant Conversations -- Democratizing Large Language Model Alignment [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2304.07327
LMSYS Org. Chatbot Arena Leaderboard Updates (Week 2) [Internet]. LMSYS Org. 2023 [cité 5 juin 2023]. Disponible sur: https://lmsys.org/blog/2023-05-10-leaderboard
Radford A, Narasimhan K, Salimans T, Sutskever I. Improving Language Understanding by Generative Pre-Training. 2018 [cité 25 mai 2023]; Disponible sur: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language Models are Unsupervised Multitask Learners. 2019 [cité 25 mai 2023]; Disponible sur: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
Schick T, Dwivedi-Yu J, Dessì R, Raileanu R, Lomeli M, Zettlemoyer L, et al. Toolformer: Language Models Can Teach Themselves to Use Tools [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2302.04761
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal Policy Optimization Algorithms [Internet]. arXiv; 2017 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/1707.06347
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of Go without human knowledge. Nature. 1 oct 2017;550:354‑9. Wang X, Wei J, Schuurmans D, Le Q, Chi E, Narang S, et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2203.11171
Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In: 36th Conference on Neural Information Processing Systems [Internet]. New Orleans; 2022 [cité 25 mai 2023]. Disponible sur: https://arxiv.org/pdf/2201.11903
Yao S, Yu D, Zhao J, Shafran I, Griffiths TL, Cao Y, et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. In: 37th Conference on Neural Information Processing Systems [Internet]. New Orleans; 2023a [cité 25 mai 2023]. Disponible sur: https://arxiv.org/pdf/2305.10601
Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan K, et al. ReAct: Synergizing Reasoning and Acting in Language Models [Internet]. arXiv; 2023b [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2210.03629
Zhou Y, Muresanu AI, Han Z, Paster K, Pitis S, Chan H, et al. Large Language Models Are Human-Level Prompt Engineers [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2211.01910
Référence bibliographique et accès à la vidéo : State of GPT | BRK216HFS. 2023 [cité 2 juin 2023]. Disponible sur : https://www.youtube.com/watch?v=bZQun8Y4L2A
Connexe
Powerpoint de la présentation au format PDF disponible sur le site web d'Andrej Karpathy :
Andrej Karpathy. State of GPT @ Microsoft Build 2023 (slides) [en ligne] [cité 12 juil. 2023]. Disponible sur : https://karpathy.ai/stateofgpt.pdf