Abstract Systematic cross-modality inference and integration of pathological morphologies and multilayer molecular profiles have advanced disease biology; however, methodological challenges remain in multimodal learning. Here, we present Multi-Embed, a unified and interpretable framework for multimodal learning between multilevel morphologies and multilayer molecular profiles.