Framework of DICE-Talk. Our method comprises three key components: disentangled emotion embedder, correlation-enhanced emotion conditioning, and emotion discrimination objective. These architectural elements work synergistically to decouple identity representations from emotional cues while preserving facial articulation details, thereby generating lifelike animated portraits with emotionally nuanced expressions.
@article{tan2025dicetalk,
title={Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation},
author={Tan, Weipeng and Lin, Chuming and Xu, Chengming and Xu, FeiFan and Hu, Xiaobin and Ji, Xiaozhong and Zhu, Junwei and Wang, Chengjie and Fu, Yanwei},
journal={arXiv preprint arXiv:2504.18087},
year={2025}
}