功能请求:搜索实现模糊搜索 (upgrade to fuzzy search engine)

May 14, 2022

当我们热爱的emby，随着的功能越来越多，支持的模块也越来越多，目前有（电影，音乐，电视直播，电视剧集）后期也许会有有声读物。而且功能也是越做越好，我是一个爱音乐爱好者，经常收藏一些金典好听的歌曲，现在正在使用测试版，歌词的支持让我越来越喜欢这款软件，感谢emby所有团队人员没日没夜工作，为我们客户提供功能上帮助和产品维护。

希望emby团队能把搜索引擎升级为模糊搜索，在使用搜索时我觉得电影名为英文时会比较好些，但对电影名为中文时就很糟糕。

列如:当有500部影片中收索“美女与野兽” 这部影片，只需输入“美女与野兽”中任一个字或者词就能搜到，并且这个功能扩展到所有库（音乐，电影，电视剧集，电视直播，有声读物）而不是一定要输入影片名全名或者影片名前两个字才能搜到。

针对成人影片内容不希望在搜索搜到问题，我觉得可以在媒体库→新建媒体库中设置一个搜索可选项，在不在主页搜索中搜到这个库中影片就把搜索选项前面钩钩去掉。（这个功能应该之前就有提到）。

May 14, 2022

您好，是的，我们计划在未来的更新中改进搜索。感谢您的反馈！

May 15, 2022

22 hours ago, Luke said:

您好，是的，我们计划在未来的更新中改进搜索。感谢您的反馈！

ok，很期待

May 15, 2022

On 5/15/2022 at 1:12 AM, Luke said:

您好，是的，我们计划在未来的更新中改进搜索。感谢您的反馈！

嗨，卢克目前我发现国内有大佬已经出了支持测试版模糊搜索功能，我还未测试，据说有的平台测试ok，是一个补丁包，看看对你们是否有用？可以添加到下个测试版中。

5_6179199118389282156.rar

May 16, 2022

We'd have to know exactly what they did, but I'm guessing they probably just updated the embedded sqlite build.

May 16, 2022

3 minutes ago, Luke said:

我们必须确切地知道他们做了什么，但我猜他们可能只是更新了嵌入式 sqlite 构建。

我想也是的，他们只是做了自己想要的功能，一直呼吁大家支持正版。我个人想法是既然国内有人需求这些功能，期望官方能将这些功能添加到正式版中，这会让emby越来越完善，完美。谢谢回复

May 27, 2022

The search function of emby is not very user-friendly, and it has poor support for Chinese. For example, it does not support Chinese character search starting from any position, nor does it support pinyin initial letter search.
Looking forward to optimization, thank you development team.

May 27, 2022

For Chinese splitting and word segmentation search, please refer to the github library:
https://github.com/wangfenjin/simple

https://www.wangfenjin.com/posts/simple-tokenizer/

May 28, 2022

On 2022/5/17 at AM2点31分, Luke said:

我们必须确切地知道他们做了什么，但我猜他们可能只是更新了嵌入式 sqlite 构建。

On 2022/5/27 at PM2点50分, wolong_zb said:

中文分词和分词搜索请参考github库：
https ://github.com/wangfenjin/simple

https://www.wangfenjin.com/posts/simple-tokenizer/

嗨，你好！卢克，这个功能是否在4.8.0后版本中添加测试？

July 19, 2022

On 5/27/2022 at 2:21 PM, wolong_zb said:

For example, it does not support Chinese character search starting from any position, nor does it support pinyin initial letter search.

Yes, it would be highly appreciated if the search function can include searching Chinese Media using Pinyin, which is the notation for Characters represented using Latin alphabets. Additionally, this is not directly related to this feature request, but it would be great if the sort title for all medias in Emby can use Pinyin instead of the current sorting algorithm. Currently I have no idea where to find a specific artist while scrolling through my music artists tab. Alternatively, the ability to read sort title from embedded file metadata could be a temporary solution as we can simply write the sort title using an automated script, as the current configuration of storing edited metadata in the database requires manually adding sort titles each of the media items to achieve the desired effect. Thanks in advance!

September 5, 2022

On 5/17/2022 at 2:31 AM, Luke said:

We'd have to know exactly what they did, but I'm guessing they probably just updated the embedded sqlite build.

Hey luke, I am happy to tell you with the details of the so called fuzzy search, and I am the maker of the patch posted by ds2355.

I made my own version of extended FTS5 index, which is better for Chinese only, with specialized tokenizer, brought by loading a binary .so/.dll extension to the sqlite.

This is not so applicable for the main version, as it just modified the structure of original database, also some edits on SQLitePCL.raw , and it is quite hard to recover it back to what it was. BTW, there is some new good feature with SQLitePCL.raw main version lately, hopefully we can follow up on those progress, though the dealing with strings differs from the version used by Emby, I think the mainstream version is better.

I posted a hands-on tutorial only for those who have enough knowledge with sqlite as well as camke build, and also .Net, for they really know what they're doing, with the irreversible database changes. Man should be responsible for his own mods, so I didn't give out a complete build, but a video and a full text tutorial instead. Everyone needded must do it all by himself, and I think it it fair.

This seemingly smart guy kept sharing my not-everyone-applicable mods here, without asking me for that, I am quite mad with it but there's nothing I can do about it.

I hereby strongly condemn ds2355 for stealing my code and publishing it without my knowledge, and hope that Luke ignores all series of these unreasonable pleas from the man of low morals.

致ds2355：你没有征求我的允许，一再重新发布我的修改mod，并且不断骚扰官方作者。我明确告知你，我所做的修改都不是适用于Emby的主要用户的，都是面向小众需求的修改。你的做法激怒了我，之后我不会再分享相关教程和代码了。请你尽量不要让我发现你是哪一个人，否则我会针对你个人，做出最大限度的报复。

To ds2355: You repeatedly republished my modified mods without asking my permission, and you keep harassing the official author, which is nasty. I'm making it clear to you that none of the changes I've made are intended for Emby's main user group, but for niche needs. Your approach irritated me, and I won't share related tutorials and code after that. Please try not to let me find out who you are, otherwise I will take revenge to the greatest extent possible against you personally.

Edited September 5, 2022 by Josephus

September 5, 2022

Hi, thanks for sharing. If the code for this were available we'd certainly take a look at it.

September 7, 2022

On 9/5/2022 at 11:53 PM, Luke said:

Hi, thanks for sharing. If the code for this were available we'd certainly take a look at it.

What I did is to explicitly open the extension load option, with my own SQLitePCL.raw build. And then load the prebuild .so/.dll file of simple-tokenizer (https://www.wangfenjin.com/posts/simple-tokenizer/ useful for Chinese only, to drop the old and build my new FTS5 database with better Chinese index, the release on GitHub is not valid, should be build at local).

There is NO benefit for non-Chinese users, with this extension, and it corrupts the original database. The extension use the same way to build index as WeChat (a im software for Chinese). Chinese is quite different for its fewer text syllables, from English and other latin-languages. For developers who are not familiar with Chinese, it is very difficult. Each Chinese character has a corresponding Latin spelling, the extension provides full spelling, initials index. Chinese words are different from English. English can rely on spaces for word segmentation, while Chinese can only be searched through the preset vocabulary.

Maybe Luke can consider changing the default tokenizer to things like ICU tokenizer, but it makes no differents for latin-languages, and it seems to be better for the nons in the field of word segmentation.

Edited September 7, 2022 by Josephus

功能请求:搜索实现模糊搜索 (upgrade to fuzzy search engine)

Recommended Posts

ds2355 3

Link to comment

Share on other sites

Luke 37009

Link to comment

Share on other sites

ds2355 3

Link to comment

Share on other sites

ds2355 3

Link to comment

Share on other sites

Luke 37009

Link to comment

Share on other sites

ds2355 3

Link to comment

Share on other sites

wolong_zb 8

Link to comment

Share on other sites

wolong_zb 8

Link to comment

Share on other sites

ds2355 3

Link to comment

Share on other sites

SheepYY039 1

Link to comment

Share on other sites

Josephus 15

Link to comment

Share on other sites

Luke 37009

Link to comment

Share on other sites

Josephus 15

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Activity