Search with "WebSocket"
Notice that results 2 - 4 are duplicates, as are results 5 - 7
It's much worse under "Page text matches"
Related to this are the comments on the forums here: http://talk.webplatform.org/forums/index.php/2063/can-search-accuracy-be-improved
*** This bug has been marked as a duplicate of bug 19351 ***
I do not believe this bug is a duplicate of bug 19351, that one refers to the snippet text content that appears in search results, this one refers to the fact that many duplicate results appear for a search.
They both would be fixed by an upgrade to the search function. I am referencing as See Also between the two.
I think the key to solving this one is having the search read the output of the page source compared to the direct source.
Created attachment 1279 [details]
Created attachment 1280 [details]
Sample for custom search box
Copy the behavior of the #searchform form to post to the search page.
This solution in my previous two attachments uses the Google Custom Search API to peform the search in-page. The default code snippet includes its own search box, but to have a site-wide search box, replace the current form behavior with that of #searchform in the custom search box sample.
Next steps: This code still needs to be integrated in the actual search page and styles, and the site-wide search box needs to be replaced.
The code snippets in the attachments use a sample Google Custom Search account I have created, but a new account can easily be created at http://www.google.com/cse/manage/create
After a further discussion about implementation, we found a pre-made extension for MediaWiki at http://www.mediawiki.org/wiki/Extension:Google_Custom_Search_Engine that may be a simpler way to integrate Custom Search Engine.
*** Bug 19351 has been marked as a duplicate of this bug. ***
I just made this bug report the central location for site search issues.
Also, there are a handful of search extensions that we could use (including one for Google Site Search). http://www.mediawiki.org/wiki/Category:Search_extensions has a list of them. We should review a few of them once we have the codez online and see which would be best to use.
New location: http://project.webplatform.org/search/issues/6