视频添加翻译音轨

Suppose you have comedies in English and you can want to play them for someone who neither knows English nor want to read Chinese subtitles. You now want to add to it a Chinese sound track without little human intervention and in mass volume.

I know you can use OpenAI Whisper to transcribe it (you now know what is said from some time to another) and OpenAI ChatGPT to translate it to Chinese, and some TTS software to synthesize the audio.

The problem is, how can you keep the accent(voiceprint) and intonation of the character, which contains much of the essence of a comedy. It is also important and challenging to retain laugh and other background sound.