■
Overview
Search is one of the most powerful features in Simpplr. It's prominently displayed at the top left of every page because data shows that when it comes to intranets, users want to get in, complete their task, and get out. For these reasons we put search up front and center, and we’ve invested a lot of resources into making our search best of class. Search is powerful for three reasons: It’s smart, federated, and curated. Search results are also faceted. More on this below.
Search features
Smart
Simpplr search is smart. That means it's powered by elastic search. It takes into account your profile data on Simpplr such as geographic location, department, and more to help serve personalized search results. So the results from one user's search may not be the same results as another user.
The adaptive machine learning also ensures your search results will get better over time the more you use it. The search algorithms continue to learn as you use Simpplr.
Simpplr search will also show you recent searches you've made each time you go to search. This makes it easy to find recently used information.
Finally, our auto-suggested results feature will suggest results based on the initial characters you type into the search box. this saves time and allows users to be more efficient.
Federated
Simpplr Search is federated, meaning in addition to searching content from the entire intranet, it also searches any integrations plugged into Simpplr, such as file repositories like Google MyDrive. Again, this allows your team members to find any content from one centralized location.
Curated
As mentioned above, Search provides personalized results on a user level. This can be configured by Site managers to allow only relevant content to be shown. For example, Site managers may want to block an outdated Benefits folder from the HR site form appearing in search results. While maintaining 2020 benefits on the intranet, they can choose to show the 2021 benefits folder in the search results instead.
How does Simpplr search work?
There are various ways to weigh the search. The main types we use are relevancy, recency, popularity and personalization. We’ll discuss what these mean below.
These can be cumulatively added or work independently. When the different methods are combined together, the overall outcome is often quite complicated to predict.
Search Type | Weighting being used |
Content | Relevancy > Recency |
People | Relevancy > Recency |
Sites | Relevancy > Recency |
Files | Relevancy > Recency |
Relevancy
Relevancy is the starting point for all searches. This is the process of matching the query to results.
Some analyzers are processed while indexing the documents and while making the query:
- Lowercase
- Words made into lowercase to increase chance of matches
- Remove special characters
- E.g. ‘Wi-fi’ matches to ‘wifi’ and ‘wi fi’
- Stemming
- E.g. ‘runs’ matches to ‘runs’, ‘running’, ‘runners’
- Stop words removed
- ‘a’, ‘to’, ‘be’, etc… all common words are removed to retrieve more accurate results
Some analyzers just happen while querying:
- TFIDF (Term Frequency Inverse Document Frequency)
- If a word is repeated a lot in documents it is given much less weight, this helps key words have more prominence in the results
- Mismatched spellings
- Fuzzy matching - covers mismatching and matches data even when there are multiple differences
- The longer the query, the greater the threshold for mistakes
- Can be any letters that are incorrectly added
- We have gone with the standard fuzzy matching rules of Elastic-search
If there is only one result from the search, then relevancy is all that's needed. When there are multiple results, we will consider other factors.
Prefix matching
Prefix matching is used to complete a search term by predicting the ending based on the prefix you’ve typed.
Although this is a powerful feature in search it increases the index size vastly:
- E.g. To use prefix matching for the term 'Adam', the index will need to store A, Ad, Ada and Adam.
If prefix matching was used on the whole index, it would slow the search function down vastly, and potentially return very confusing results. Because of this limitation, prefix matching is used sparingly.
The autocomplete function uses prefix matching for all titles:
- Site names
- People names
- Content titles
The global search only uses prefix matching on:
- People names
We added prefixing on people names in the global search because without it, the results seemed odd at times:
- E.g. Typing ‘Jo’ into autocomplete would return the results, ‘Joe’, ‘Jonathan’, ‘Jovita’. But if you then did a global search using ‘Jo’ there would be no results.
- Now that we have prefix matching in the global search this isn’t a problem.
- However for the sake of index size and search speed we made the decision not to include it for site names and content titles in the global search.
Recency
Having spent some time experimenting with different combinations of search functions, Simpplr decided that within the confines of an intranet, for the vast majority of content, the most important factor for weighting results (apart from relevancy) is recency.
- Examples:
- If you search for a ‘Company Update’ there may be hundreds of results in the search ,but the one that you most likely to want to view is the most recent
- If you search for ‘Benefits’, you want to know it’s the most recent version of the Benefits policy that appears at the top of the results
Although there may be some occasions where the recency of a piece of content is not as important as its popularity, we feel this will be an exception.
The result of this is that when users are searching, the results should be in chronological order.
- There may be some discrepancies due to one piece of content being more relevant to the initial term than others, so it ends up with a higher weighting.
- E.g. A piece of content contains the initial search term multiple times in the title and summary
Numeric matching
Search can find relevant content based on numeric strings you input. For example, if you search 8765, the top results will include content with numeric values containing that string of digits. Often, this can be a policy number you only remember the first few digits of. Search results also include characters found in files uploaded to the intranet, such as PDFs.
‘Did you mean’ suggestions
Suggestions for similar content based on your search query will be made when no results are found. Simpplr uses phrase suggestion features of elastic search to display more contextual and relevant suggestions. Elastic search factors in the popularity of content.
What's included in Global Search and Auto-complete suggestions?
Global search and auto-complete suggestions includes results from the following:
- Links tile
- Rich text tile
- HTML tile
- Site information tile
- Apps tab
Tiles results
Apps results
Links results
Comments
Please sign in to leave a comment.