Submitted: 24 January 2026 You are already at the latest version Video diffusion models integrate visual, temporal, and textual signals, creating potential pathways for cross-modal bias transfer. This paper studies how alignment tuning affects the transmission of social bias between text and visual modalities in video generation. We evaluate 14,200 text-to-video samples using a cross-modal attribution framework that decomposes bias contributions across input modalities.