Jump to content


Photo

Wrong characters in subtitles

subtitles

Best Answer phantasm79666 , 24 August 2019 - 04:06 PM

Attached the scripts I made using windows versions of iconv and uchardet.

 

1. Unzip Attached File  convertsrt.zip   1.59MB   1 downloads

2. Edit bin\convertsrt.bat to specify folders to scan and encoding to convert to utf8
3. Run bin\convertsrt.bat 

 

Should work on Windows 10.

Go to the full post


  • Please log in to reply
14 replies to this topic

#1 phantasm79666 OFFLINE  

phantasm79666

    Newbie

  • Members
  • 8 posts
  • Local time: 01:34 AM

Posted 16 August 2019 - 04:24 PM

Hello!

 

I got wrong characters in subtitles on my LG (Netcast) TV using DLNA playback using Emby server 4.2.1 on windows 10. 

 

The subtitles are downloaded from Open Subtitles and are stored in ansi 

 

2019-08-16 01:20:52.834 Info HttpClient: POST https://api.opensubtitles.org/xml-rpc
2019-08-16 01:20:52.968 Info SubtitleManager: Saving subtitles to D:\Sorozatok\Downton Abbey\S05\downton.abbey.s05e01.hdtv.x264.tla.hu.srt
 
Attached File  orig.srt   92.45KB   0 downloads

 

If I play it back accuted characters are wrong.

orig.jpg

 

If I edit the downloaded subtitle file with notepad++ and save it as utf8 with bom then it plays back correcty.

Attached File  utf8bom.srt   95.72KB   0 downloads
utf8bom.jpg

 

Is there a solution? I'd like to avoid manually eding all downloaded subtitles.

Attached Files


Edited by phantasm79666, 16 August 2019 - 04:31 PM.


#2 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 135740 posts
  • Local time: 08:34 PM

Posted 16 August 2019 - 04:26 PM

Hi there, please attach the emby server log. thanks.



#3 phantasm79666 OFFLINE  

phantasm79666

    Newbie

  • Members
  • 8 posts
  • Local time: 01:34 AM

Posted 16 August 2019 - 04:32 PM

Hi there, please attach the emby server log. thanks.

Attached 

Attached File  embyserver.zip   622.45KB   1 downloads


#4 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 135740 posts
  • Local time: 08:34 PM

Posted 16 August 2019 - 10:41 PM

Can you please zip up the original subtitles and attach them here? thanks.



#5 phantasm79666 OFFLINE  

phantasm79666

    Newbie

  • Members
  • 8 posts
  • Local time: 01:34 AM

Posted 17 August 2019 - 02:29 AM

Attached 

Attached File  subtitles-orig.zip   36.42KB   1 downloads

 

Attached Files



#6 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 135740 posts
  • Local time: 08:34 PM

Posted 17 August 2019 - 01:37 PM

What time in the video do those screenshots correspond to?



#7 phantasm79666 OFFLINE  

phantasm79666

    Newbie

  • Members
  • 8 posts
  • Local time: 01:34 AM

Posted 17 August 2019 - 03:12 PM

1.
orig.jpg

The original wrong acutes (btw all acutes diplay wrong):

64
00:03:52,810 --> 00:03:56,520
Ó, de jó!
Szívből gratulálok!
 
2.
utf8bom.jpg
The one with ut8 bom is identical to the screenshot (all good in whole video):
326
00:16:51,500 --> 00:16:52,810
Köszönöm!
 


#8 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 135740 posts
  • Local time: 08:34 PM

Posted 18 August 2019 - 05:55 PM

How does the web app compare?



#9 phantasm79666 OFFLINE  

phantasm79666

    Newbie

  • Members
  • 8 posts
  • Local time: 01:34 AM

Posted 20 August 2019 - 03:00 AM

I have tested the orig srt and the modified utf8 bom encoded srt in the web player on Safari on iOS 12.4 and Chrome 76.0.3809.100 on Windows.

 

The results are the same in both browsers.

 

Both versions are readable I noticted only differences for two characters.

In orig srt "û" displayed instead of "ű"

In orig srt "ô" displayed instead of "ő"

 

It looks like for me that web client picks it up wrong encoding too, it uses ISO-8859-1 instead of ISO-8859-2 character encoding.

However the difference is not major and it is readable at the end, but the utf8 bom encoded version is how it should really look like in hungarian.

 

According to my tests UTF-8 with bom forces all subtitles to correctly display on DLNA and web clients.

It looks like that the DLNA client in the TV assumes srt is always UTF-8 encoded.

 

Researched a bit more and it looks like I having a problem similar to this issue

 https://emby.media/c...lna-lg-netcast/

however in my case the correct subtitle files are sent (the one with the hu suffix) problem is only the encoding of the characters.

 

I'd like if this is sorted out in emby if possible rather than I change manually all non utf8 encoded srt files to utf8. Is there a magic setting which does this or a way to force in the dlna profile the subtitles to utf8?


Edited by phantasm79666, 20 August 2019 - 03:07 AM.


#10 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 135740 posts
  • Local time: 08:34 PM

Posted 20 August 2019 - 01:05 PM

It's not an easy answer. You can't just force something to utf8. You have to know the encoding of the input file in order to be able to convert it.
  • phantasm79666 likes this

#11 phantasm79666 OFFLINE  

phantasm79666

    Newbie

  • Members
  • 8 posts
  • Local time: 01:34 AM

Posted 22 August 2019 - 04:37 PM

Thanks Luke!

 

I understand... 

 

It's clear for me now from that this is a "fault" of the subtitles provided for the videos. Kind of a problem that most of the subtitles I have in hungarian are having this issue :(

 

Maybe a plugin could be made to fix those where encoding can be detected and that could be useful for many of us with the same problem.

 

I'll experiment with some tools to detect encoding and then write a script to convert my subtitles, probably running it as a scheduled task.



#12 phantasm79666 OFFLINE  

phantasm79666

    Newbie

  • Members
  • 8 posts
  • Local time: 01:34 AM

Posted 22 August 2019 - 05:50 PM

Hi Luke,

 

Just compiled uchardet on windows https://www.freedesk...tware/uchardet/

It reliably outputs for all wrong subtitle files:

- ISO-8859-2

And for all working subtitle files

- UTF-8

 

Probably a feature to convert subtitle files to UTF-8 if detected encoding is matching a list would solve this issue.

 

Can you build something like this in into Emby?



#13 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 135740 posts
  • Local time: 08:34 PM

Posted 22 August 2019 - 08:55 PM

We have a c# port of that already built into the server for encoding detection, and most of the time it is pretty accurate. Ours might be based on an older version though.

#14 phantasm79666 OFFLINE  

phantasm79666

    Newbie

  • Members
  • 8 posts
  • Local time: 01:34 AM

Posted 24 August 2019 - 04:06 PM   Best Answer

Attached the scripts I made using windows versions of iconv and uchardet.

 

1. Unzip Attached File  convertsrt.zip   1.59MB   1 downloads

2. Edit bin\convertsrt.bat to specify folders to scan and encoding to convert to utf8
3. Run bin\convertsrt.bat 

 

Should work on Windows 10.

Attached Files


Edited by phantasm79666, 24 August 2019 - 04:07 PM.


#15 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 135740 posts
  • Local time: 08:34 PM

Posted 24 August 2019 - 04:22 PM

Great stuff, thanks !


  • phantasm79666 likes this





Also tagged with one or more of these keywords: subtitles

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users