Jump to content


Photo

correct char encoding for subtitle conversion

subtitle

Best Answer Tikuf , 30 October 2013 - 03:14 AM

Cool I will submit a patch for Luke shortly

Go to the full post


  • Please log in to reply
8 replies to this topic

#1 thoror OFFLINE  

thoror

    Advanced Member

  • Members
  • 41 posts
  • Local time: 03:34 PM

Posted 30 October 2013 - 01:12 AM

I have many korean subtitles, and they are usually encoded with cp949.

When the subtitle is named with the .kor language specifier, the ffmpeg command uses the following:

(with option -sub_charenc windows-1252)

C:\Users\thoror\AppData\Roaming\MediaBrowser-Server\ffmpeg\ffmpeg20131011\ffmpeg.exe -sub_charenc windows-1252 -i "\\AUSTIN\media\TV Series\The Big Bang Theory\Season 1\The.Big.Bang.Theory.S01E02.720p.BluRay.x264-SiNNERS.kor.srt" "C:\Users\thoror\AppData\Roaming\MediaBrowser-Server\cache\subtitles\1\1329e16a-7e00-b6cb-5f3f-00e0c93fbfc6.ass"

Thus, the resulting .ass file contains a bunch of gibirish:

Dialogue: 0,0:00:06.67,0:00:07.95,Default,글쎄..
Dialogue: 0,0:00:07.95,0:00:12.04,Default,언제 부풀어오를지 모르니까\N다들 하워드 잘 보고 있어
Dialogue: 0,0:00:33.65,0:00:34.57,Default,내가 나갈게
Dialogue: 0,0:00:35.25,0:00:36.95,Default,혹시 나 부었어? 부어오른 거 같아
Dialogue: 0,0:00:38.86,0:00:40.43,Default,- 레너드, 안녕하세요\N- 안녕하세요, 페니
Dialogue: 0,0:01:04.02,0:01:07.04,Default,편한대로 부르세요\N어차피 전 최저임금 받으니까요

This is true whether I encode the .srt file in CP949 or UTF-8.

Futhermore, when using UTF-8, a bunch of errors can be seen in the sub-convert log:

[subrip @ 003dbdc0] Unable to recode subtitle event "자, 여기
땅콩 뺀 <font color="#FFD700">팻 타이</font>" from windows-1252 to UTF-8
Error while decoding stream #0:0: Error number -42 occurred
[subrip @ 003dbdc0] Unable to recode subtitle event "그런데 땅콩 기름 들어간 거 아니야?" from windows-1252 to UTF-8
Error while decoding stream #0:0: Error number -42 occurred
[subrip @ 003dbdc0] Unable to recode subtitle event "벌 날아다니는 때는 아니니까
내 <font color="#FFD700">에피네프린을 쓰게 해 줄게</font>" from windows-1252 to UTF-8
Error while decoding stream #0:0: Error number -42 occurred

Would it be possible to set the char_enc to the correct value for the specific language (I believe this issue should not be limited to Korean), or even support another tag such as .kor.utf8 which also gives information on the char encoding?

 

As a work around, if I do not include the .kor language specifier, then the command is:

C:\Users\thoror\AppData\Roaming\MediaBrowser-Server\ffmpeg\ffmpeg20131011\ffmpeg.exe -ss 14.0657032 -i "\\AUSTIN\media\TV Series\The Big Bang Theory\Season 1\The.Big.Bang.Theory.S01E01.720p.BluRay.x264-SiNNERS.srt" -ss 1 "C:\Users\thoror\AppData\Roaming\MediaBrowser-Server\cache\subtitles\e\e9430590-01c3-b802-8909-f172b8fdb71b.ass"

which does not contain the -sub_charenc windows-1252 option, and works flawlessly, if I have my .srt file pre-encoded in UTF-8.


Edited by thoror, 30 October 2013 - 01:18 AM.


#2 Tikuf OFFLINE  

Tikuf

    Obsolete User

  • Members
  • 3626 posts
  • Local time: 08:34 AM

Posted 30 October 2013 - 01:39 AM

Can you try this command for me please we have not added Korean

C:\Users\thoror\AppData\Roaming\MediaBrowser-Server\ffmpeg\ffmpeg20131011\ffmpeg.exe -sub_charenc iso-8859-1 -i "\\AUSTIN\media\TV Series\The Big Bang Theory\Season 1\The.Big.Bang.Theory.S01E02.720p.BluRay.x264-SiNNERS.kor.srt" "C:\Users\thoror\AppData\Roaming\MediaBrowser-Server\cache\subtitles\1\1329e16a-7e00-b6cb-5f3f-00e0c93fbfc6.ass"

windows 1252 is our default


Edited by Tikuf, 30 October 2013 - 01:41 AM.


#3 thoror OFFLINE  

thoror

    Advanced Member

  • Members
  • 41 posts
  • Local time: 03:34 PM

Posted 30 October 2013 - 02:03 AM

Hi Tikuf,

 

-sub_charenc iso-8859-1 does not work,

but  -sub_charenc cp949 works.

 

Thanks!!



#4 Tikuf OFFLINE  

Tikuf

    Obsolete User

  • Members
  • 3626 posts
  • Local time: 08:34 AM

Posted 30 October 2013 - 02:33 AM

ok thanks can you please make one of your srt files available for download so i can test



#5 thoror OFFLINE  

thoror

    Advanced Member

  • Members
  • 41 posts
  • Local time: 03:34 PM

Posted 30 October 2013 - 02:51 AM

Here is one sample file.

 

https://www.dropbox....SiNNERS.kor.srt



#6 Tikuf OFFLINE  

Tikuf

    Obsolete User

  • Members
  • 3626 posts
  • Local time: 08:34 AM

Posted 30 October 2013 - 03:06 AM

Does this look right?

 

5270b001245e6_korsubs.jpg



#7 thoror OFFLINE  

thoror

    Advanced Member

  • Members
  • 41 posts
  • Local time: 03:34 PM

Posted 30 October 2013 - 03:10 AM

Yup! :)



#8 Tikuf OFFLINE  

Tikuf

    Obsolete User

  • Members
  • 3626 posts
  • Local time: 08:34 AM

Posted 30 October 2013 - 03:14 AM   Best Answer

Cool I will submit a patch for Luke shortly


  • thoror likes this

#9 thoror OFFLINE  

thoror

    Advanced Member

  • Members
  • 41 posts
  • Local time: 03:34 PM

Posted 30 October 2013 - 03:18 AM

Thanks so much for the prompt fix!!!







Also tagged with one or more of these keywords: subtitle

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users