Jump to content

Subtitle problem for accented characters


christophe.ferrandon

Recommended Posts

Ok, here we go. This is a bash shell script I also have another Powershell version in case someone wants it:

(...)

Basically you need to put this script on the root of your media folder tree. The script will go down each sub-folder, check if the sub file is in "FROM" encoding, and convert it to "TO" encoding.

Just configure those variables according to the enconding you want to use.

The script will also save a backup of your original sub file before convertion, in case something goes wrong and you want to recover your original file.

 

For sure it can be improved since I'm not using it anymore (moved to powershell since my fileserver is running Windows Server), so feel free to do it and share! :)

 

Cheers

 

Created a script to auto-detect new files and automagically change their encoding using your method.

#!/bin/bash
# Srt Encoder
WATCH_PATH="/mnt/Storage/Movies /mnt/Storage/TVShows"
FROM="iso-8859-1"
TO="utf-8"
ICONV="iconv -f $FROM -t $TO"

inotifywait -m --format '%w%f' -e create -e moved_to -e modify -r $WATCH_PATH |
	while read file; do
		EXT=${file##*.}
		if [ $EXT = "srt" ]
		then
			IS_TARGET=`file "${file}" | grep -i iso-8859`
			if [ "$IS_TARGET" != "" ]; then
				echo "${file} ---- Will be converted!"
				cp "${file}" "${file}.bak"
				$ICONV < "${file}.bak" > "${file}"
			fi
		fi
	done

This will watch for create/move/movify events to existing files on the folders specified in the variable WATCH_PATH, when any of those events is fired, it will verify if the extension is .srt, if it is, it will change the encoding.

 

As this watches for changes I created a systemctl service for it. 

[Unit]
Description=SRT Encoder
After=network.target

[Service]
ExecPreStart=pkill SCRIPT_NAME
ExecStart=SCRIPT_LOCATION
Restart=on-abort
TimeoutSec=20

[Install]
WantedBy=multi-user.target

Change SCRIPT_LOCATION to the location of the script and SCRIPT_NAME to the name of the script. This way, after enabling the service, the script will start when the server boots.

Edited by miquelx
  • Like 1
Link to comment
Share on other sites

  • 2 months later...
ravenj

Hello,

 

I know this is very simple and easy to use but couldn't it, or any other method, be easily included in emby?

 

I have both linux and windows server installations and, with windows, there's no problem with subtitles.

 

I can understand this is not a problem for most of emby's users because not everyone uses subtitles. But, in my opinion, the problem's priority shouldn't be defined according the number of users that have this specific need.

 

I just don't like to see both linux and subtitles users being discriminated.

 

Thank you.

Best regards

  • Like 1
Link to comment
Share on other sites

anderbytes

Hello,

 

I know this is very simple and easy to use but couldn't it, or any other method, be easily included in emby?

 

I have both linux and windows server installations and, with windows, there's no problem with subtitles.

 

I can understand this is not a problem for most of emby's users because not everyone uses subtitles. But, in my opinion, the problem's priority shouldn't be defined according the number of users that have this specific need.

 

I just don't like to see both linux and subtitles users being discriminated.

 

Thank you.

Best regards

 

I agree with you @@ravenj, I really didn't want to believe it but being a Linux emby user sometimes ends up in losing priority.

 

Maybe because the devs use Windows in their tests...

Link to comment
Share on other sites

  • 2 weeks later...

Ok, guys I found out how to run the script.

My question is: how do I go about modifying the output to "iso-8859-2"

 

If I use this, it doesn't work.

FROM="iso-8859-1"
TO="iso-8859-2"

Thanks

Link to comment
Share on other sites

Ok, I finally understood how this works.

What I need to do in order to have the subs displaying properly (in my notepad++)

 

First, I need to select - character set > eastern european > iso-8859-2

Then I need to convert to UTF-8 and save.

 

Is there any way to automate this? 

Thanks!

Edited by ciss
Link to comment
Share on other sites

  • 3 months later...
  • 1 month later...
Barlog9511

Ok, here we go. This is a bash shell script I also have another Powershell version in case someone wants it:

#!/bin/bash
FROM=iso-8859-1
TO=utf-8
ICONV="iconv -f $FROM -t $TO"

# Convert
find ./ -type f -name "*.srt" | while read fn; do

    IS_TARGET=`file "${fn}" | grep -i iso-8859`
    
    if [ "$IS_TARGET" = "" ]; then

        echo "${fn} ---- Will NOT be converted!"

    else
        
        echo "${fn} ---- Will be converted!"
        cp "${fn}" "${fn}.bak"
        $ICONV < "${fn}.bak" > "${fn}"
        
    fi

done

Basically you need to put this script on the root of your media folder tree. The script will go down each sub-folder, check if the sub file is in "FROM" encoding, and convert it to "TO" encoding.

Just configure those variables according to the enconding you want to use.

The script will also save a backup of your original sub file before convertion, in case something goes wrong and you want to recover your original file.

 

For sure it can be improved since I'm not using it anymore (moved to powershell since my fileserver is running Windows Server), so feel free to do it and share! :)

 

Cheers

 

I have some issue in converting CZECH or SLOVAK subtitles but in this case is better use enca library, so I rewrite your original script :)

#!/bin/bash
LANGUAGE=czech
TO=utf8
CONVERT="enca -L $LANGUAGE -x $TO"

# Find and onvert
find ./ -type f -name "*.srt" | while read fn; do

    IS_TARGET=`enca "${fn}" | grep -o UTF-8`
    if [ "$IS_TARGET" = "UTF-8" ]; then

        echo "${fn} ---- Will NOT be converted!"

    else
        
        echo "${fn} ---- Will be converted!"
	# optional backup of original srt
        # cp "${fn}" "${fn}.bak"
        $CONVERT "${fn}"
        
    fi

done
Edited by Barlog9511
Link to comment
Share on other sites

This problem was one of the main reasons that made me move to a windows server.

 

Can't believe it's isn't solved yet within emby.

 

Best regards

  • Like 1
Link to comment
Share on other sites

anderbytes

This problem was one of the main reasons that made me move to a windows server.

 

Can't believe it's isn't solved yet within emby.

 

Best regards

If devs were people that live in countries with accented chars, that would be already fixed ages ago, even if this is only a Linux issue.

 

I just ignore that Emby has this feature. I use subliminal and scheduled jobs to auto download them

Link to comment
Share on other sites

anderbytes

Has been done in http://emby.media/community/index.php?/topic/7719-external-subtitles-ansi/page-2 (by April 2016)

 

Ok that the user that posted the issue has given up and didn't respond to http://emby.media/community/index.php?/topic/7719-external-subtitles-ansi/page-2&do=findComment&comment=320035 , but I can download and post one now if you want.

Edited by anderbytes
Link to comment
Share on other sites

anderbytes

And more... nothing strange will appear in logs, because the subtitle is downloaded correctly.

Link to comment
Share on other sites

anderbytes

Here it goes the downloaded by Emby... look at times 3 and 5 in file...

"N�o" should be "Não", for example.

 

 

A script that was already published in this forum converts from iso-8859-1 to utf-8, then the sub becomes perfectly readable in playback.

 

Screenshots:

 

1. File in Linux, before manual conversion (script).

57c70afe9f797_Tela2016083113h42m001.jpg

 

2. File in Windows before manual conversion (script) - Windows seems to read it OK.

 

57c70b3720624_Tela2016083113h43m001.jpg

 

3. File in Linux after manual conversion (script).

 

57c70b53d615b_Tela2016083113h50m001.jpg

Angry.Birds.(2016).1080p.BluRay.x264.AACpt-BR.pob.zip

Edited by anderbytes
Link to comment
Share on other sites

Same problem here. Any way to make this work, without changing each srt file to utf8 manually? Emby is the only where I have this problem, Kodi and Plex both work great. However, I would prefer Emby over Plex, just need to get this issue with subtitltes resolved. Thanks

 

Sent from my Xperia Z3 Compact using Tapatalk

Link to comment
Share on other sites

Same problem here. Any way to make this work, without changing each srt file to utf8 manually? Emby is the only where I have this problem, Kodi and Plex both work great. However, I would prefer Emby over Plex, just need to get this issue with subtitltes resolved. Thanks

 

Sent from my Xperia Z3 Compact using Tapatalk

 

Hi there, welcome. can you please discuss a specific example? thanks.

Link to comment
Share on other sites

Hi there, welcome. can you please discuss a specific example? thanks.

 

Hi Luke! Sure, any srt file downloaded in hungarian language from the popular sites. 

 

Using Emby in Chrome, I get this:

 

57c936f19e643_screenshot_wrong_subtitles

 

The same srt file plays perfectly fine everywhere else: Kodi (Linux & Win), Plex, VLC, MX-Player, etc.

 

Here is the srt file: bojack.horseman.s01e01.720p.webrip.x264-w4f.hu.srt.7z

 

All the Hungarian characters are wrong, it should be:

- A nyomorék parkolóba álltam.
Remélem, nem gáz. - Hogy hova állt?

Any ideas?

Edited by szimat
Link to comment
Share on other sites

I get that this can be resolved with a script, but don't get me wrong, I don't want to run scripts on perfectly fine subtitles. I really think this should be somehow resolved within Emby? Don't you think?

 

Sent from my Xperia Z3 Compact using Tapatalk

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...