GPT
Precursor
Proximal Policy Optimization (PPO) - an RL algorithm, PPO is better than state-of-the-art approaches while being much simpler to implement and tune and is the default reinforcement learning algorithm at OpenAI.
Learning from human preference (human in the loop) - a method used to infer what humans want by being told which of two proposed behaviors is better.
instructGPT - arguably better at following user intentions than GPT-3 while also making them more truthful and less toxic, using human in the loop.
Articles
what is chatGPT doing and why does it work? explaining next word prediction in detail.
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study - "PPO is able to surpass other alignment methods in all cases and achieve state-of-the-art results in challenging code competitions."
Competitions
GPT 4 Hackathon code results
Tools
Sentence Embeddings
Virtual assistants
flowGPT - has many bots, prompts.
Last updated