Wrong characters in subtitles

August 16, 2019

Hello!

I got wrong characters in subtitles on my LG (Netcast) TV using DLNA playback using Emby server 4.2.1 on windows 10.

The subtitles are downloaded from Open Subtitles and are stored in ansi

2019-08-16 01:20:52.834 Info HttpClient: POST https://api.opensubtitles.org/xml-rpc

2019-08-16 01:20:52.968 Info SubtitleManager: Saving subtitles to D:\Sorozatok\Downton Abbey\S05\downton.abbey.s05e01.hdtv.x264.tla.hu.srt

orig.srt

If I play it back accuted characters are wrong.

If I edit the downloaded subtitle file with notepad++ and save it as utf8 with bom then it plays back correcty.

utf8bom.srt

Is there a solution? I'd like to avoid manually eding all downloaded subtitles.

orig.srt

utf8bom.srt

embyserver.zip

Edited August 16, 2019 by phantasm79666

August 16, 2019

Hi there, please attach the emby server log. thanks.

August 16, 2019

Hi there, please attach the emby server log. thanks.

Attached

embyserver.zip

August 17, 2019

Can you please zip up the original subtitles and attach them here? thanks.

August 17, 2019

Attached

subtitles-orig.zip

August 17, 2019

What time in the video do those screenshots correspond to?

August 17, 2019

1.

The original wrong acutes (btw all acutes diplay wrong):

64

00:03:52,810 --> 00:03:56,520

Ó, de jó!

Szívből gratulálok!

2.

The one with ut8 bom is identical to the screenshot (all good in whole video):

326

00:16:51,500 --> 00:16:52,810

Köszönöm!

August 18, 2019

How does the web app compare?

August 20, 2019

I have tested the orig srt and the modified utf8 bom encoded srt in the web player on Safari on iOS 12.4 and Chrome 76.0.3809.100 on Windows.

The results are the same in both browsers.

Both versions are readable I noticted only differences for two characters.

In orig srt "û" displayed instead of "ű"

In orig srt "ô" displayed instead of "ő"

It looks like for me that web client picks it up wrong encoding too, it uses ISO-8859-1 instead of ISO-8859-2 character encoding.

However the difference is not major and it is readable at the end, but the utf8 bom encoded version is how it should really look like in hungarian.

According to my tests UTF-8 with bom forces all subtitles to correctly display on DLNA and web clients.

It looks like that the DLNA client in the TV assumes srt is always UTF-8 encoded.

Researched a bit more and it looks like I having a problem similar to this issue

https://emby.media/community/index.php?/topic/66351-subtitle-encoding-issues-on-dlna-lg-netcast/

however in my case the correct subtitle files are sent (the one with the hu suffix) problem is only the encoding of the characters.

I'd like if this is sorted out in emby if possible rather than I change manually all non utf8 encoded srt files to utf8. Is there a magic setting which does this or a way to force in the dlna profile the subtitles to utf8?

Edited August 20, 2019 by phantasm79666

August 20, 2019

It's not an easy answer. You can't just force something to utf8. You have to know the encoding of the input file in order to be able to convert it.

August 22, 2019

Thanks Luke!

I understand...

It's clear for me now from that this is a "fault" of the subtitles provided for the videos. Kind of a problem that most of the subtitles I have in hungarian are having this issue

Maybe a plugin could be made to fix those where encoding can be detected and that could be useful for many of us with the same problem.

I'll experiment with some tools to detect encoding and then write a script to convert my subtitles, probably running it as a scheduled task.

August 22, 2019

Hi Luke,

Just compiled uchardet on windows https://www.freedesktop.org/wiki/Software/uchardet/

It reliably outputs for all wrong subtitle files:

- ISO-8859-2

And for all working subtitle files

- UTF-8

Probably a feature to convert subtitle files to UTF-8 if detected encoding is matching a list would solve this issue.

Can you build something like this in into Emby?

August 23, 2019

We have a c# port of that already built into the server for encoding detection, and most of the time it is pretty accurate. Ours might be based on an older version though.

August 24, 2019

Attached the scripts I made using windows versions of iconv and uchardet.

1. Unzip convertsrt.zip

2. Edit bin\convertsrt.bat to specify folders to scan and encoding to convert to utf8

3. Run bin\convertsrt.bat

Should work on Windows 10.

convertsrt.zip

Edited August 24, 2019 by phantasm79666

August 24, 2019

Great stuff, thanks !

Wrong characters in subtitles

Recommended Posts

phantasm79666 0

Link to comment

Share on other sites

Luke 37009

Link to comment

Share on other sites

phantasm79666 0

Link to comment

Share on other sites

Luke 37009

Link to comment

Share on other sites

phantasm79666 0

Link to comment

Share on other sites

Luke 37009

Link to comment

Share on other sites

phantasm79666 0

Link to comment

Share on other sites

Luke 37009

Link to comment

Share on other sites

phantasm79666 0

Link to comment

Share on other sites

Luke 37009

Link to comment

Share on other sites

phantasm79666 0

Link to comment

Share on other sites

phantasm79666 0

Link to comment

Share on other sites

Luke 37009

Link to comment

Share on other sites

phantasm79666 0

Link to comment

Share on other sites

Luke 37009

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Activity