Jump to content

correct char encoding for subtitle conversion


thoror
Go to solution Solved by Tikuf,

Recommended Posts

I have many korean subtitles, and they are usually encoded with cp949.

When the subtitle is named with the .kor language specifier, the ffmpeg command uses the following:

(with option -sub_charenc windows-1252)

C:\Users\thoror\AppData\Roaming\MediaBrowser-Server\ffmpeg\ffmpeg20131011\ffmpeg.exe -sub_charenc windows-1252 -i "\\AUSTIN\media\TV Series\The Big Bang Theory\Season 1\The.Big.Bang.Theory.S01E02.720p.BluRay.x264-SiNNERS.kor.srt" "C:\Users\thoror\AppData\Roaming\MediaBrowser-Server\cache\subtitles\1\1329e16a-7e00-b6cb-5f3f-00e0c93fbfc6.ass"

Thus, the resulting .ass file contains a bunch of gibirish:

Dialogue: 0,0:00:06.67,0:00:07.95,Default,글쎄..
Dialogue: 0,0:00:07.95,0:00:12.04,Default,언제 부풀어오를지 모르니까\N다들 하워드 잘 보고 있어
Dialogue: 0,0:00:33.65,0:00:34.57,Default,내가 나갈게
Dialogue: 0,0:00:35.25,0:00:36.95,Default,혹시 나 부었어? 부어오른 거 같아
Dialogue: 0,0:00:38.86,0:00:40.43,Default,- 레너드, 안녕하세요\N- 안녕하세요, 페니
Dialogue: 0,0:01:04.02,0:01:07.04,Default,편한대로 부르세요\N어차피 전 최저임금 받으니까요

This is true whether I encode the .srt file in CP949 or UTF-8.

Futhermore, when using UTF-8, a bunch of errors can be seen in the sub-convert log:

[subrip @ 003dbdc0] Unable to recode subtitle event "자, 여기
땅콩 뺀 <font color="#FFD700">팻 타이</font>" from windows-1252 to UTF-8
Error while decoding stream #0:0: Error number -42 occurred
[subrip @ 003dbdc0] Unable to recode subtitle event "그런데 땅콩 기름 들어간 거 아니야?" from windows-1252 to UTF-8
Error while decoding stream #0:0: Error number -42 occurred
[subrip @ 003dbdc0] Unable to recode subtitle event "벌 날아다니는 때는 아니니까
내 <font color="#FFD700">에피네프린을 쓰게 해 줄게</font>" from windows-1252 to UTF-8
Error while decoding stream #0:0: Error number -42 occurred

Would it be possible to set the char_enc to the correct value for the specific language (I believe this issue should not be limited to Korean), or even support another tag such as .kor.utf8 which also gives information on the char encoding?

 

As a work around, if I do not include the .kor language specifier, then the command is:

C:\Users\thoror\AppData\Roaming\MediaBrowser-Server\ffmpeg\ffmpeg20131011\ffmpeg.exe -ss 14.0657032 -i "\\AUSTIN\media\TV Series\The Big Bang Theory\Season 1\The.Big.Bang.Theory.S01E01.720p.BluRay.x264-SiNNERS.srt" -ss 1 "C:\Users\thoror\AppData\Roaming\MediaBrowser-Server\cache\subtitles\e\e9430590-01c3-b802-8909-f172b8fdb71b.ass"

which does not contain the -sub_charenc windows-1252 option, and works flawlessly, if I have my .srt file pre-encoded in UTF-8.

Edited by thoror
Link to comment
Share on other sites

Can you try this command for me please we have not added Korean

C:\Users\thoror\AppData\Roaming\MediaBrowser-Server\ffmpeg\ffmpeg20131011\ffmpeg.exe -sub_charenc iso-8859-1 -i "\\AUSTIN\media\TV Series\The Big Bang Theory\Season 1\The.Big.Bang.Theory.S01E02.720p.BluRay.x264-SiNNERS.kor.srt" "C:\Users\thoror\AppData\Roaming\MediaBrowser-Server\cache\subtitles\1\1329e16a-7e00-b6cb-5f3f-00e0c93fbfc6.ass"

windows 1252 is our default

Edited by Tikuf
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...