State of GPT | BRK216HFS

Andrej Karpathy @ Microsoft Developer, 2023

Extrait du texte en description de la vidéo sur la chaîne YouTube Microsoft Developper :  

Learn about the training pipeline of GPT assistants like ChatGPT, from tokenization to pretraining, supervised finetuning, and Reinforcement Learning from Human Feedback (RLHF). Dive deeper into practical techniques and mental models for the effective use of these models, including prompting strategies, finetuning, the rapidly growing ecosystem of tools, and their future extensions. 
Speaker:  Andrej Karpathy 
Session Information: This video is one of many sessions delivered for the Microsoft Build 2023 event.



Points particuliers abordés dans la vidéo :

I. Comment entraîner un ​assistant GPT

    Processus d’entraînement des modèles de langages, principalement GPT, en 4 étapes qui se suivent dans cet ordre :

    1. Pretraining > Base model

    2. Supervised Finetuning > SFT model

    3. Reward Modeling > RM model 

    4. Reinforcement Learning > RL model 

II. Comment utiliser un assistant GPT dans l'application que l'on souhaite développer 

Cas concrets 


Sources principales citées sur les slides de la présentation :

Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language Models are Few-Shot Learners [Internet]. arXiv; 2020 [cité 25 mai 2024]. Disponible sur: http://arxiv.org/abs/2005.14165 

Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2303.12712 

Jang E. Can LLMs Critique and Iterate on Their Own Outputs? [Internet]. Eric Jang. 2023 [cité 5 juin 2023]. Disponible sur: https://evjang.com/2023/03/26/self-reflection.html 

Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large Language Models are Zero-Shot Reasoners [Internet]. arXiv; 2023 [cité 8 févr 2024]. Disponible sur: http://arxiv.org/abs/2205.11916 

Köpf A, Kilcher Y, von Rütte D, Anagnostidis S, Tam ZR, Stevens K, et al. OpenAssistant Conversations -- Democratizing Large Language Model Alignment [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2304.07327 

LMSYS Org. Chatbot Arena Leaderboard Updates (Week 2) [Internet]. LMSYS Org. 2023 [cité 5 juin 2023]. Disponible sur: https://lmsys.org/blog/2023-05-10-leaderboard 

Radford A, Narasimhan K, Salimans T, Sutskever I. Improving Language Understanding by Generative Pre-Training. 2018 [cité 25 mai 2023]; Disponible sur: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf 

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language Models are Unsupervised Multitask Learners. 2019 [cité 25 mai 2023]; Disponible sur: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf 

Schick T, Dwivedi-Yu J, Dessì R, Raileanu R, Lomeli M, Zettlemoyer L, et al. Toolformer: Language Models Can Teach Themselves to Use Tools [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2302.04761 

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal Policy Optimization Algorithms [Internet]. arXiv; 2017 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/1707.06347 

Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of Go without human knowledge. Nature. 1 oct 2017;550:354‑9. Wang X, Wei J, Schuurmans D, Le Q, Chi E, Narang S, et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2203.11171 

Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In: 36th Conference on Neural Information Processing Systems [Internet]. New Orleans; 2022 [cité 25 mai 2023]. Disponible sur: https://arxiv.org/pdf/2201.11903 

Yao S, Yu D, Zhao J, Shafran I, Griffiths TL, Cao Y, et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. In: 37th Conference on Neural Information Processing Systems [Internet]. New Orleans; 2023a [cité 25 mai 2023]. Disponible sur: https://arxiv.org/pdf/2305.10601 

Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan K, et al. ReAct: Synergizing Reasoning and Acting in Language Models [Internet]. arXiv; 2023b [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2210.03629 

Zhou Y, Muresanu AI, Han Z, Paster K, Pitis S, Chan H, et al. Large Language Models Are Human-Level Prompt Engineers [Internet]. arXiv; 2023 [cité 12 juill 2024]. Disponible sur: http://arxiv.org/abs/2211.01910 


Référence bibliographique et accès à la vidéo :   State of GPT | BRK216HFS. 2023 [cité 2 juin 2023]. Disponible sur : https://www.youtube.com/watch?v=bZQun8Y4L2A


  Connexe 

Powerpoint de la présentation au format PDF disponible sur le site web d'Andrej Karpathy

Andrej Karpathy. State of GPT @ Microsoft Build 2023 (slides) [en ligne] [cité 12 juil. 2023]. Disponible sur : https://karpathy.ai/stateofgpt.pdf

 

Partager cet article
Archives