The MUST model evaluation exercise: Patterns in model performance