Framework of DICE-Talk. Our method comprises three key components: disentangled emotion embedder, correlation-enhanced emotion conditioning, and emotion discrimination objective. These architectural elements work synergistically to decouple identity representations from emotional cues while preserving facial articulation details, thereby generating lifelike animated portraits with emotionally nuanced expressions.
@misc{tan2025disentangleidentitycooperateemotion,
title={Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation},
author={Weipeng Tan and Chuming Lin and Chengming Xu and FeiFan Xu and Xiaobin Hu and Xiaozhong Ji and Junwei Zhu and Chengjie Wang and Yanwei Fu},
year={2025},
eprint={2504.18087},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.18087},
}