Jump to content

Wrong characters in subtitles


phantasm79666
Go to solution Solved by phantasm79666,

Recommended Posts

phantasm79666

Hello!

 

I got wrong characters in subtitles on my LG (Netcast) TV using DLNA playback using Emby server 4.2.1 on windows 10. 

 

The subtitles are downloaded from Open Subtitles and are stored in ansi 

 

2019-08-16 01:20:52.834 Info HttpClient: POST https://api.opensubtitles.org/xml-rpc
2019-08-16 01:20:52.968 Info SubtitleManager: Saving subtitles to D:\Sorozatok\Downton Abbey\S05\downton.abbey.s05e01.hdtv.x264.tla.hu.srt
 

If I play it back accuted characters are wrong.

post-501213-0-66371700-1565986566_thumb.jpg

 

If I edit the downloaded subtitle file with notepad++ and save it as utf8 with bom then it plays back correcty.

post-501213-0-65155100-1565986704_thumb.jpg

 

Is there a solution? I'd like to avoid manually eding all downloaded subtitles.

post-501213-0-66371700-1565986566_thumb.jpg

post-501213-0-65155100-1565986704_thumb.jpg

orig.srt

utf8bom.srt

embyserver.zip

Edited by phantasm79666
Link to comment
Share on other sites

phantasm79666
1.
post-501213-0-66371700-1565986566_thumb.jpg

The original wrong acutes (btw all acutes diplay wrong):

64
00:03:52,810 --> 00:03:56,520
Ó, de jó!
Szívből gratulálok!
 
2.
post-501213-0-65155100-1565986704_thumb.jpg
The one with ut8 bom is identical to the screenshot (all good in whole video):
326
00:16:51,500 --> 00:16:52,810
Köszönöm!
 
Link to comment
Share on other sites

phantasm79666

I have tested the orig srt and the modified utf8 bom encoded srt in the web player on Safari on iOS 12.4 and Chrome 76.0.3809.100 on Windows.

 

The results are the same in both browsers.

 

Both versions are readable I noticted only differences for two characters.

In orig srt "û" displayed instead of "ű"

In orig srt "ô" displayed instead of "ő"

 

It looks like for me that web client picks it up wrong encoding too, it uses ISO-8859-1 instead of ISO-8859-2 character encoding.

However the difference is not major and it is readable at the end, but the utf8 bom encoded version is how it should really look like in hungarian.

 

According to my tests UTF-8 with bom forces all subtitles to correctly display on DLNA and web clients.

It looks like that the DLNA client in the TV assumes srt is always UTF-8 encoded.

 

Researched a bit more and it looks like I having a problem similar to this issue

 https://emby.media/community/index.php?/topic/66351-subtitle-encoding-issues-on-dlna-lg-netcast/

however in my case the correct subtitle files are sent (the one with the hu suffix) problem is only the encoding of the characters.

 

I'd like if this is sorted out in emby if possible rather than I change manually all non utf8 encoded srt files to utf8. Is there a magic setting which does this or a way to force in the dlna profile the subtitles to utf8?

Edited by phantasm79666
Link to comment
Share on other sites

It's not an easy answer. You can't just force something to utf8. You have to know the encoding of the input file in order to be able to convert it.

  • Like 1
Link to comment
Share on other sites

phantasm79666

Thanks Luke!

 

I understand... 

 

It's clear for me now from that this is a "fault" of the subtitles provided for the videos. Kind of a problem that most of the subtitles I have in hungarian are having this issue :(

 

Maybe a plugin could be made to fix those where encoding can be detected and that could be useful for many of us with the same problem.

 

I'll experiment with some tools to detect encoding and then write a script to convert my subtitles, probably running it as a scheduled task.

Link to comment
Share on other sites

phantasm79666

Hi Luke,

 

Just compiled uchardet on windows https://www.freedesktop.org/wiki/Software/uchardet/

It reliably outputs for all wrong subtitle files:

- ISO-8859-2

And for all working subtitle files

- UTF-8

 

Probably a feature to convert subtitle files to UTF-8 if detected encoding is matching a list would solve this issue.

 

Can you build something like this in into Emby?

Link to comment
Share on other sites

We have a c# port of that already built into the server for encoding detection, and most of the time it is pretty accurate. Ours might be based on an older version though.

Link to comment
Share on other sites

  • Solution
phantasm79666

Attached the scripts I made using windows versions of iconv and uchardet.

 

1. Unzip convertsrt.zip

2. Edit bin\convertsrt.bat to specify folders to scan and encoding to convert to utf8
3. Run bin\convertsrt.bat 

 

Should work on Windows 10.

convertsrt.zip

Edited by phantasm79666
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...