Orchestrating Thousands of GPUs: Engineering Patterns for Large-Scale Model Training