AgentInstruct Uses Agentic Flows To Create Synthetic Training Data For a while now, Microsoft has been working on creating purpose specific, fine-grained, high quality synthetic training data. Cobus Greyling · Follow 5 min read · Just now — This training data is created by larger more capable models, with specific guardrails to ensure the generated data is quality, diverse and nuanced. Previous methods employed human supervision and annotation to ensure the data is nuanced on a very granular level, now agentic flows are used. AgentInstruct, an extensible agentic framework for automatically creating large amounts of diverse and high-quality synthetic data. Consider for a moment the card below, on how Microsoft trained the Phi-3 SLM. This simple procedure outlines how they ensured the data is diverse and repetition is avoided. […]
Original web page at cobusgreyling.medium.com