... | ... | @@ -5,6 +5,8 @@ |
|
|
|
|
|
[Clustering Summarizer](https://git.opendfki.de/reuschling/dynaq4solr/wikis/modules#clustering-summarizer)
|
|
|
|
|
|
[Collaborative Filtering](https://git.opendfki.de/reuschling/dynaq4solr/wikis/modules/edit#collaborative-filtering)
|
|
|
|
|
|
[Contextualization](https://git.opendfki.de/reuschling/dynaq4solr/wikis/modules#contextualization)
|
|
|
|
|
|
[Document group summarizer](https://git.opendfki.de/reuschling/dynaq4solr/wikis/modules#document-group-summarizer)
|
... | ... | @@ -75,6 +77,49 @@ http://earlytrendradarservice.kl.dfki.de/solr/etrCollection/clusteringSTC?q=%2Bd |
|
|
|
|
|
---
|
|
|
|
|
|
#### Collaborative Filtering ###
|
|
|
|
|
|
The module for collaborative filtering enables to perform arbitrary cf queries, and is designed to be independent from the data structures inside the index.
|
|
|
The module doesn't differ between classical, pre-defined 'item' and 'user' roles. As an alternative, the attributes that should be considered as ids and references between entities are defined as part of the query. Furthermore, the query syntax doesn't force you to choose between pre-defined forms of queries, e.g. item-user-item, user-item-user. The CF module gets a 'chain of id attributes', that can be arbitrary long, where a chain link defines one hop between two entities. Thus, you can perform much more flexible queries, such as user-itemType1-itemType2-usergroup-itemType3-user-....etc.
|
|
|
|
|
|
___solrconfig.xml___ entry:
|
|
|
|
|
|
```
|
|
|
<searchComponent name="dynaQCFComponent" enable="true" class="de.dfki.km.dynaq.cf.CollaborativeFilteringSearchComponent"></searchComponent>
|
|
|
|
|
|
<requestHandler name="/cf" class="de.dfki.km.dynaq.util.SearchHandlerWithoutComponents">
|
|
|
<arr name="first-components">
|
|
|
<str>dynaQCFComponent</str>
|
|
|
</arr>
|
|
|
</requestHandler>
|
|
|
```
|
|
|
|
|
|
__Parameters:__
|
|
|
|
|
|
|
|
|
* &idAttributeChain=[searchIn,extractFrom,querySuffix][..] ... [..]<br>
|
|
|
* querySuffix is optional, [searchIn,extractFrom] is valid<br>
|
|
|
* Abbrevation for the first chain link: [extractFrom]<br>
|
|
|
* Abbrevation for the last chain link: [searchIn]
|
|
|
|
|
|
* &chainRows=number, default:5000
|
|
|
|
|
|
|
|
|
1. The system searches for the specified query (&q) or, if it is not the first chain link, the values extracted in the former chain link. These values will be searched inside the index under the attribute specified with 'searchIn'. The values will be boosted according to their counts. For this query, it is possible to append a query specified with 'querySuffix', for e.g. filtering out unwanted documents inside this hop. One example for this could be 'search in the index general 'id' field, but consider only documents of type 'user' (querySuffix could be '+Content-Type:user')'. To prune the result list, also the parameter '&chainRows' will be considered. In the case there is a succeeding chain link, it goes further with point 2. Otherwise, the current result list is the final result, considering &fl, &fq and &rows parameters.
|
|
|
|
|
|
2. As the next step the system gets the result list from point 1, and extracts all values from the result list documents under the attribute specified with 'extractFrom'. The values will be counted, whereby the counts acts as a score, and thus have a meaningfull order. Normally, these values are ids to other entities, e.g. userIds or itemIds. To prune the number of extracted values, the parameter '&chainRows' will be considered. In any case, the system goes further to the next chain link, processing point 1.
|
|
|
|
|
|
|
|
|
|
|
|
__Examples:__
|
|
|
```
|
|
|
http://koeln:8014/solr/movielens/cf?q=dataEntityId:m2628&idAttributeChain=[,50StarUser_ss,][userId_sv,50StarMovie_ss,%2BContent-Type:user][dataEntityId,,]&fl=score,dataEntityId,title&rows=13
|
|
|
same as
|
|
|
http://koeln:8014/solr/movielens/cf?q=dataEntityId:m2628&idAttributeChain=[50StarUser_ss][userId_sv,50StarMovie_ss,%2BContent-Type:user][dataEntityId]&fl=score,dataEntityId,title&rows=13&chainRows=5000
|
|
|
```
|
|
|
|
|
|
---
|
|
|
|
|
|
#### Contextualization ###
|
|
|
The DynaQ module __ContextDocsSearchComponent__ gives the possibility to contextualize your queries with certain documents, describing the topic/context you want to search for. For example, you want to search inside the domain of fishes, and you have a huge index with pet forenames. You search for 'harry', and recieve birds, cats, and fishes. Beside to add a new search term 'fish', you can set one or more (possibly preconfigured) fish documents as context alternatively. The fishes named 'harry' will appear on the top of your result list. Or, if you doesn't specify 'harry' anymore, you will receive any fish documents in your corpus (fuzzy), performing a statistical document similarity search.
|
|
|
|
... | ... | |