Abstract Large language models (LLMs) are increasingly applied in clinical communication, yet their reliability depends on high-quality conversational corpora. Real-world doctor–patient recordings are frequently degraded by noise, transcription errors, speaker overlap, and fragmented dialogue structure, limiting their usability for downstream model training.