Commons talk:Media search

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Please leave your feedback about the interface and experience of using Special:MediaSearch. Keegan (WMF) (talk) 18:13, 28 May 2020 (UTC)[reply]

Thanks for the first round of feedback

[edit]

The development team greatly appreciates the feedback left by everyone here, thanks for taking the time. There will be a new round in the near future with design changes to the prototype based on what's been received about the tool, and I'll get a post along with it to explain the influence and purpose of the changes. Keegan (WMF) (talk) 16:28, 22 June 2020 (UTC)[reply]

I'll be updating the page tomorrow with mockups of design changes based on feedback that you can expect to see in the next version of the prototype. Keegan (WMF) (talk) 21:15, 1 July 2020 (UTC)[reply]

Update posted

[edit]

I've updated the page, please have a look. Keegan (WMF) (talk) 16:58, 2 July 2020 (UTC)[reply]

Copying over what I put on the project page:

Vue.js and the next three weeks or so (13 July 2020)

MediaSearch is being ported to a new software library, vue.js. During the next few weeks while the port takes place, three features will briefly be removed from MediaSearch that will be restored when the port is complete:

  • Autocomplete–this will return stand-alone.
  • Audio/video playback–this will return as part of the Quick View feature.
  • Filters–these will return as part of the new Filters feature.

I'll update when changes go live. Keegan (WMF) (talk) 18:31, 13 July 2020 (UTC)[reply]

Media Viewer

[edit]

It would be useful when clicking an image in the grid... that a Media Viewer were displayed, delivering additional data without exiting the search. Now when you click a thumbail you get redirected to the file page. Strakhov (talk) 02:16, 25 June 2020 (UTC)[reply]

Interesting, thanks for the feedback. Keegan (WMF) (talk) 16:30, 29 June 2020 (UTC)[reply]

New tabs–"Other" and "Categories and Pages"

[edit]

I've updated the status section with screenshots of new designs. The team is building in a new tab, "Other," to handle file types such as .pdf, .djv, and .stl. The "Categories" tab has been expanded to "Categories and Pages," to cover text pages like talk pages and other non-mainspace areas of Commons. You can expect to see these new tabs live within the next few weeks after the vue.js port is completed.

Speaking of the vue.js port, the changes related to that should be going live later next week (week of 27 July). These changes will temporarily remove a couple of features as I've previously posted about. The features will be returning soon with these new tabs. Thanks for following along, I'll have more information next week after the vue.js version is live. Keegan (WMF) (talk) 17:53, 24 July 2020 (UTC)[reply]

Odd result

[edit]

Hey, found one that is interesting -- searched for "Native American" (was comparing it with the new Google search's handling of a sensitive topic) and surfaced a few odd results not explainable by text or concepts on the page (including this fish and one of the American Gothic copies. Is there any way to understand/expose why the something is in a result to, for example, either improve the structured data, or give more specific feedback on the search results, etc? Sadads (talk) 22:52, 2 September 2020 (UTC)[reply]

@Sadads: This actually isn't exclusively a MediaSearch problem. It's an inherent, long-standing issue with text matching algos with our search backend as you can see from this example with default Commons search. As you can see there, the fish image shows up because it matches both the words "native" and "America" in the description. It's a somewhat similar situation with "American gothic" - compound terms are tricky for the Commons current search algorithm and it tries to match the component words separately as well as combined. Current search relies on text matching algorithms which are usually okay but make key assumptions. MediaSearch makes some improvements, but will work much better with structured data on more files. MediaSearch, as you can see, actually does a better job of surfacing relevant content as it tries to rank/prioritize files with structured data plus a few other tweaks. But, despite having some unique logic of its own, MediaSearch still utilizes the algorithms of the old search too so it will inherit some of that behavior. MediaSearch is still alpha level software and we'll continue to tweak and build upon the improvements we've already made, but by its nature search is imperfect and some inaccurate results are bound to show up. We'll do our best to keep those to a minimum. RIsler (WMF) (talk) 19:58, 3 September 2020 (UTC)[reply]
@RIsler (WMF): Oh that makes sense -- and I totally think this is a huge improvement. I think more what I was asking is around the lines that in the old search you get a hint for the text that "matches" that content (bolded text) -- is that something that you would be getting in the Quickview? -- i.e. a little widget highlighting the text or feature (i.e. structured data) that is being used to drive the result? or even a hover over feature that revealed some of the elements of the algorithm in the back that play heavily in that result. I am going to go in and tweak the language on that result for instance. Sadads (talk) 21:30, 3 September 2020 (UTC)[reply]
For example, I was able to get the fish removed from the result (American Gothic is kindof a bizarre one there and in the main results -- its not registering any use of the word "Native" there). Sadads (talk) 21:41, 3 September 2020 (UTC)[reply]
@Sadads: Ah yes. Quickview is actually available now, but it's behind a flag. You can add &quickview=1 to any search result URL (like this) to enable it. That will show you both the filename and the description, which is where there are most likely to be text matches. There are some technical limitations keeping us from highlighting the matching terms but we're looking into it. Not sure how that will play out at the moment. RIsler (WMF) (talk) 22:50, 3 September 2020 (UTC)[reply]
I think for me, less than highlighting (which would be great), it would be great to expose the elements of the search algorithm that weighed the most in it matching so that there could be a advanced user interface element (like the various ORES tools on English that expose which topic to the user) so that if something is over-weighting an image toward that search result it would be easy to zoom in and fix it -- i.e. if its the Structured data, or caption, or random bits of the text or a category, etc) -- I don't know if this is something that the search interface exposes anywhere. For example, I discovered that this file was featuring prominently in the search for "lion" and it wasn't immediately obvious why until I shifted into the Structured Data tab. Sadads (talk) 12:15, 4 September 2020 (UTC)[reply]
That does indeed sound like a cool feature. When it comes to weighting, we have to defer to the expertise of the Search Platform team and I'm not sure how much we can expose the inner workings of CirrusSearch, but I'll mention it to them and see if we can formulate a plan for that kind of advanced curation tool. RIsler (WMF) (talk) 19:37, 8 September 2020 (UTC)[reply]
@Sadads and RIsler (WMF): as far as I know the "text" field is one of the most important fields for the search engine to work on. If you look at the search contents for the American Gothic English Wikipedia article, you'll notice that the "text" field contains the article.
The text field here is a mess. Take the example file, the only contents of the text field is "English" so it seems to fall back to "auxiliary_text" which does contain both "Native" and "American". Time should be spend time on improving the search indexing. This whole tool is build on top of an standard search api query so if the index contains garbage, the results will contain garbage. Multichill (talk) 18:48, 23 September 2020 (UTC)[reply]
@Multichill: Thanks for pointing that out. We plan to look into some longstanding Commons search issues in addition to improving the methodology by incorporating structured data into the weighting as we build out MediaSearch. RIsler (WMF) (talk) 21:20, 24 September 2020 (UTC)[reply]

trying the tool with polish terms

[edit]

I was trying the tool using polish terms. Some observations:

  • I tried "Księżyc" (moon) and was shown bunch of photos seemingly unrelated to the moon. The issue was that the the images were cropped sometimes removing most of the photo and the moon was often in the cropped part. It would be nice to opt for no photo cropping.
  • I tried "glowa żyrafy" (giraffe head) and got 2 head shots of giraffe head
  • I tried "łysa głowa" (bald head) and got nothing. I tried "Lysy" or "łysy" (bald) and got files by User:Lysy. I tried "Łysina" (baldness) and got a lot of places in Poland with that word in the filename. I tried "bald head" and got a lot of images but very few of bald heads. I tried "shaven head" and got a few more images. "Ogolona glowa" gave me nothing. So the search for bald heads works only in English.
  • I tried "pasikonik" (grasshopper) and got a lot of files with grasshoppers (without word "pasikonik" in the file page) and some images by User:Pasikonik1979
  • I tried "komorka" (cell phone, cell in a tissue, shed) and got a lot of images with that word in the filename. No other cell phones or sheds.

--Jarekt (talk) 01:15, 24 September 2020 (UTC)[reply]

Great, thanks for the details. Keegan (WMF) (talk) 16:56, 24 September 2020 (UTC)[reply]

ChristianKl's thoughts

[edit]
  • I searched for "Baum" both with the setting of German and English. In both cases it lists images made by Dein Freund der Baum that don't seem to match my search intent. I would expect that it would make sense to downrank the user name in the relevancy search. ChristianKl (talk) 18:30, 24 September 2020 (UTC)[reply]
  • My search for "femur" brought up images that are NSFW when searching in German. Part of the issue seems to be that (Femur) is missing from the aliases of the relevant item, but it's unclear to me why those images end up in the search results. While NSFW images aren't of great concern to myself, some people have a problem with seeing NSFW images that they didn't ask for and it might be worth thinking about how to deal with the issue. ChristianKl (talk) 18:30, 24 September 2020 (UTC)[reply]
  • When it comes to search suggestions it would be nice to have search suggestions that propose ways to clarify what sense of a word is meant. When I search "apple" it would be nice if the search suggestions would show "apple (fruit)", "Apple Inc", "Apple (family name)". I do understand that this is a more complex feature request but if it would be possible to implement such functionality in the search it would be great. ChristianKl (talk) 18:30, 24 September 2020 (UTC)[reply]

Display error on mobile

[edit]

For whatever reason when I use this feature it doesn't display thumbnails on mobile, I don't get them on the "desktop version" of Wikimedia Commons either. Does anyone else have this? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 18:50, 24 September 2020 (UTC)[reply]

@Donald Trung: Thanks for reporting this. This is a bug with our lazy loading code on some mobile browsers. It will be addressed in an upcoming update. RIsler (WMF) (talk) 21:29, 24 September 2020 (UTC)[reply]
Thanks for the response, I think that I'll then try it on another device. The graphic user interface looks great so far. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:41, 24 September 2020 (UTC)[reply]

How it looks

[edit]

It looks a lot like a typical Ecosia, Microsoft Bing, Google, Verizon's Yahoo!, DuckDuckGo, Etc. Image search, I really like this design because it would make it familiar to most internauts (or however you call people that use the internet). Overal I would say kudos to the team for getting the design right immediately, it looks like it would be a built-in advanced search engine and a welcome upgrade of the standard search bar. Keep up the good work everyone. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 19:04, 24 September 2020 (UTC)[reply]

Text excerpts in Categories and Pages, and order of tabs

[edit]

Overall I really like the UX of this. The one change I think is needed is adding in the text excerpts to the "Categories and Pages" results. That would then cover all my needs when searching for policy/help pages and for discussions, as well as for media files.

I also tentatively suggest moving the "Other" tab next to the "Video" tab, because the "Other" file-types are still media, whereas the "Categories and Pages" are media-adjacent. Quiddity (talk) 17:39, 25 September 2020 (UTC)[reply]

Medium size and other inputs

[edit]

Hi, firstly I tried in Media search in French language, and at first view I saw no issues. Secondly I have a strange display of the results for the "medium" size selection. E.g. when searching for this DOI 10.3897/zookeys.740.20458, I have a fine relevance of the results, both for the "images" and for the "Categories and Pages". However when I select size "medium" I have 3 images that are selected, which is in itself perfectly fine, but the two first images are zoomed in (even maybe upscaled), it is not visually pleasing. Thirdly, when clicking on one of the thumbnail within the results, the windows that opens with the details is very good, maybe in this "window" little arrows, on each sides of the selected image, to go to the previous or next result could be a good thing. Christian Ferrer (talk) 21:10, 26 September 2020 (UTC)[reply]

Thanks for the feedback! We are aware of the blurry images you mentioned and are working to improve how we display images in the grid to resolve that. Also, good idea about the arrows allowing you to quickly navigate between images. MWilliams (WMF) (talk) 17:16, 30 September 2020 (UTC)[reply]

Audio and Video

[edit]

It could be interesting to add the following criteria, as the size for pictures:

  1. size for video
  2. file duration for audio and video

Djiboun (talk) 21:24, 26 September 2020 (UTC)[reply]

New filter: assessment

[edit]

Hi, it could be a good idea to add a filter "Assessment" with 3 possible choices: Quality, Featured and Valued. In order to highlight, in one click, and within the results, the images that have Wikimedia Commons valued image (Q63348040), Wikimedia Commons quality image (Q63348069) or Wikimedia Commons featured picture (Q63348049) in Structured Data. A bit the same principle as it was done with Help:FastCCI, but now using Structured Data. Christian Ferrer (talk) 07:07, 29 September 2020 (UTC)[reply]

I just would like to second this request – IMHO it is a very good idea. When search results include many many media, it can be very useful for the user to limit the results e.g. to Quality images. Thank you very much, --Aristeas (talk) 17:24, 2 March 2021 (UTC)[reply]
Thanks both for the feedback. I've created phab:T276257 to track this feature request. CBogen (WMF) (talk) 18:38, 2 March 2021 (UTC)[reply]

Still useless

[edit]

I add "Vojníkov" and I expect images of or from the village called Vojníkov, but I get a lot of crap. Why structured data only are not in use? --Juandev (talk) 16:43, 30 September 2020 (UTC)[reply]

A "structured data only" filter sounds interesting, I'll pass it along. Keegan (WMF) (talk) 17:09, 2 October 2020 (UTC)[reply]
+1. Please. Strakhov (talk) 14:42, 3 October 2020 (UTC)[reply]
I'm looking forward to use this feature in Wikipedia, but it's still flawed. When entering a Wikidata ID in the search box, for example Q90, results may include:
  • "Structured data results". Those using that Wikidata id in Structured Data (P180, P170,...). These ones should be prioritized.
  • "Category results". Those files included in a Category sitelinked from the Wikidata ID (or P373-ed). Categorisation depth could be 2, 3, 4,.... These would be useful when there are not many files using the Wikidata ID yet.
  • "String results". Those files including "Paris" somewhere (description, caption, filename...).
The third ones IMHO should be disposable when indicated (for example https://commons.wikimedia.org/wiki/Special:MediaSearch?type=bitmap&q=Q90&avoid=string or something like that). Strakhov (talk) 13:45, 10 October 2020 (UTC)[reply]

Intersection category/statement

[edit]

Hi, maybe it could be considered to give the possibility to use Media Search within a specific category (including (or not?) the subcategories), exemple all the images within the category Category:Dogs, and within its subcats, that have a depict statement with ball (Q18545), it would may be an interesting thing to develop. Christian Ferrer (talk) 08:51, 3 October 2020 (UTC)[reply]

Concernes...

[edit]

For curiosity I tried Brazil [1], and Chesus, this is bad!

Prostitution, homeless people, beaches (that are not even in Brazil), an elephant!! This search is a reinforcement of stereotypes. Most of the photos was not made by Brazilians also.

Move on to more important topics:

  • This "popularity" is quite bad.
It will dislocate the curve to photos that already have some attention, and create a bigger distance to other photos. Would be better finally embedded the FP, VI, and QI at the search. And prioritize the new ones, creating a better variety. Also, been popular is not necessary good, letting a machine work to bring photos to us is not working.
  • Two clicks $%#$¨¨¨&!!
Again, a new feature that increases the steps! What is wrong with you guys?
Seriously, all the recent changes increase steps to go to some place, the "contribution" now I have to click to open the search bar, the structured data, I need to click in a bar to open it... the UX here was always terrible, but you are increasing the issue!
Why not the photo, the author and license at the same place?
Now I'm one click from the page that I want, I do not need to click to times, and the first 3 I put as example, a link to the highest resolution, I have the author, and the license, and have a download link, at the same place.
You want to keep the info and enlarge, okay, but create a clickable link below the image. Not everyone will use the ctrl click, and on mobile, well, tiff do not work and also this two click are mandatory.
Make the things easier.
  • What's large?
Large for me are 30mb pictures, 4K videos... my 5 years old cheap mobile phone produces pictures 4032x3024, how a 1920×1080, the standard size of screen is big?
Would be better a way to us determinate the size that we want to find
  • Mapped search
I run the WLE this year, and the community request to not include as depict the protected area. But "located at" was a nice and good idea that worked for this, and we have tons of images geolocalised already.
But we do not a search that show us photos near a location.
We also do not have a map with all entries of particular subject, great for science, for example, showing the distribution of bird, based on our photos.

That is it for now. -- Rodrigo Tetsuo Argenton m 11:55, 3 October 2020 (UTC)[reply]

Hello and thank you for your feedback! Regarding your concern about two clicks, I can explain why we made this choice. We ran user research studies and usability tests on this experience without the "quick view" (the panel with more information that comes out on the right). We also tested this design with more information below the images like you've suggested. The majority of users were frustrated by needing to load an entirely new page to see the image larger and get the information they needed. There also wasn't a consensus on what information was the most important for each user, especially for the overwhelming majority of users who are new to Commons. Putting a direct link to download the image before visiting the file page also can be problematic as many Commons contributors want users to read over the information on the file page before downloading. In general, putting everything that everyone wanted below the image became messy and hard to scan, especially when that information is very inconsistent across Commons. The quick view panel is a common user experience pattern for an overwhelming majority of image search traffic across the internet and was repeatedly asked for due to these expectations. So for many users this saves time and bandwidth to quickly access a majority of the metadata for each image while being able to continue searching on the same page. That being said, I’d love to run a few experiments in the future around putting information below the image as you have in your example and appreciate your opinion and passion around this project. We have plenty of room for improvement and iteration as more people begin to use this. MWilliams (WMF) (talk) 18:35, 5 October 2020 (UTC)[reply]

Add the Total Number of results available to UI

[edit]

In the results UI, I'd like to see a "Results 1 – 20 of 23,867" indication like we get with a standard search. That way, in some circumstances if there are too many results for my needs or expectations, I will know as soon as the page has 1st loaded whether I need to refine my search criteria. That will also give me a clue about how many more results are yet to be loaded when the "Load more" button appears. Thanks! Quiddity (talk) 21:14, 16 November 2020 (UTC)[reply]

Search preference survey

[edit]

I've posted a quick survey for users to take about which search experience they prefer on Commons. Please take a moment to look it over and participate if possible, it will be open for about three weeks. Keegan (WMF) (talk) 21:08, 17 December 2020 (UTC)[reply]

Reminder

[edit]

There is still time to take the quick Media Search survey on which search experience you prefer using on Commons, Special:Search or Special:MediaSearch. The survey is only one question–which search do you prefer–and will just take a moment to fill out if you're interested. Thanks! Keegan (WMF) (talk) 19:07, 5 January 2021 (UTC)[reply]

@Keegan (WMF): Can we change our answer after we submit it? I'd like to set my answer to "no preference" for now, but I plan to test out both search tools someday before the deadline. Thanks, pandakekok9 05:51, 14 January 2021 (UTC)[reply]
Ah nevermind, I didn't see this: The survey can only be taken once, and it will not appear again after being taken. pandakekok9 08:00, 14 January 2021 (UTC)[reply]

MediaSearch does not respect file name / description changes

[edit]

copied from mw:Help talk:MediaSearch

see https://commons.wikimedia.org/wiki/Special:MediaSearch?type=bitmap&q=Matterhorngletscher the file File:Aus der Hörnlihütte.jpg is the result of a rename, as the name and the description have been wrong. But it is still found by MediaSearch, although as far as I can see, only the redirect could be the reason for that.

Sometimes a file is renamed, because the name indicates the wrong thing. After such a rename, MediaSearch should not find the file by the old term. I would skip the evaluation of redirects / renames from MediaSearch, as there always should be a reason to rename, renaming normally is a refinement, and SDC are not touched automatically by a file rename. --Herzi Pinki (talk) 18:13, 14 January 2021 (UTC)[reply]

Huh, interesting find, thank you for that. Keegan (WMF) (talk) 20:20, 26 January 2021 (UTC)[reply]
I now have filed DRs for both redirects. Proud to support MediaSearch in improving results. --Herzi Pinki (talk) 20:14, 22 June 2021 (UTC)[reply]
Often the file has been moved for reasons other than that the name indicated the wrong thing. In these cases the file should absolutely be found also under the old name. Anything else is utterly confusing. Otherwise, if I found a nice file, note its filename, and later try to find it, it would be hidden. I would still be able to find the redirect by using the URL or Special:Search, but normal users should not need to learn those means. –LPfi (talk) 14:08, 23 June 2021 (UTC)[reply]

organizational non-content categories; wikidata

[edit]

copied from mw:Help talk:MediaSearch

Under tab Categories and pages I find User:OgreBot/Uploads by new users/2019 February 07 18:00. This is silly.

And I do not find Wikidata entries at all. --Herzi Pinki (talk) 18:13, 14 January 2021 (UTC)[reply]

@Herzi Pinki: sorry for the delay in reply. There are no current or future plans to make searching Wikidata within MediaSearch possible, as that is too big of a technical challenge. The Commons Query Service may be able to suit some of your needs if you know what it is you're looking for in a query, as opposed to a search.
However, to your first point, there is a namespace selector filter that will be deployed soon that should take care of the issue that you're describing here. You'll have the ability to exclude these sorts of results if they're not what you're looking for. Keegan (WMF) (talk) 18:48, 18 February 2021 (UTC)[reply]

justification and cropping images

[edit]

https://commons.wikimedia.org/wiki/Special:MediaSearch?type=bitmap&q=Wurtenkees crops the images to justify the line with 3 images. IMHO there is no need to crop the images (in that extreme way) just for the imagination of justification. I'm on a wide screen. --Herzi Pinki (talk) 18:25, 14 January 2021 (UTC)[reply]

This was an intentional design decision. While it's true this cropping may not be needed for wide screens, most users (including myself) are not yet using large monitors. Cropping and justifying the results allows for more results to fit on a page and makes the tool much more usable on "average" size screens and laptops. It's always possible to revisit these design decisions in the future, but this is purposeful. Keegan (WMF) (talk) 18:53, 18 February 2021 (UTC)[reply]
Problem is it renders it very poor as an image search tool since you have no idea without clicking if the poor composition is an issue with the photo or the search engine.Geni (talk) 13:27, 26 February 2021 (UTC)[reply]

Updates and moving towards a default state

[edit]

@Strakhov, Sadads, Jarekt, Donald Trung, ChristianKl, Christian Ferrer, Multichill, Djiboun, Juandev, and Rodrigo.Argenton: @Herzi Pinki, GPSLeo, PKM, Syced, Mike Peel, GerardM, Ayack, Spinster, Kaldari, EugeneZelenko, and Jmabel: @Jmabel, Julle, and Quiddity:

Greetings,

Thanks to all of you for leaving comments, questions, and concerns over the past year as this tool has been developed.

There's some new feature updates to Special:MediaSearch, and only a few left to implement. Thanks in part to the feedback from everyone that's been left here, the team thinks that Media Search is fast approaching the point of being able to replace Special:Search as the default search for Commons (Special:Search will remain available, and there will be a preference to keep that page as the primary search experience). Please let us know if there are any outstanding usability or design issues you think might need addressed as we move forward, I expect to be able to give a more information about plans to the broader Commons community next week. Keegan (WMF) (talk) 18:51, 19 February 2021 (UTC)[reply]

  • @Keegan (WMF): It looks ... rather complex and segregated. If you want it to replace the main search engine, what's the basic result that will be returned when you enter a search query and press return? If I try searching for Lovell Telescope, I get images, but they are rather random. I'm normally interested in getting to the category, but that's the 5th option? How can I see a set of results that says 'Here's the most relevant images, here's some audio files, and here's the category where you can find more'? Thanks. Mike Peel (talk) 20:57, 22 February 2021 (UTC)[reply]
  • Yes, the segregation is part of the point, it vastly raises the usability of the interface.. Compare the query you're using to the old search. The category link in the search results is halfway down the page as the eleventh option, sandwiched between small media thumbnails, in addition to the link at the top, which is similar function to the tabs replicated in the new media search. Generally, when using Special:Search here on Commons, you're hit with a barrage of image files, pdfs, wikipages, and any number of other results that are jumbled together. Keegan (WMF) (talk) 20:07, 23 February 2021 (UTC)[reply]
    Unless I am mistaken, once upon a time, searching for “Lovell Telescope” would have sent Mike straight to the category: back in May 2013, the SearchExtraNS extension was activated, and as far as I remember it stayed for years. I am not sure when nor why it was eventually disabled, but to my recollection it seemed to be at the same time as the depicts search was enabled in the search box. Jean-Fred (talk) 23:51, 23 February 2021 (UTC)[reply]
Thanks for the feedback, @Jean-Frédéric: it seems like you're referring to the change made in 2019 in phab:T235263, which removed redirects directly to a matching page title. Those redirects were often sending users to gallery pages which did not accurately represent the breadth of available files. There is still a preference to allow users to enable that redirect. CBogen (WMF) (talk) 16:40, 24 February 2021 (UTC)[reply]
(Village pump announcement about this change, Search help talk page post) mw:Extension:SearchExtraNS is still installed and enabled here, FWIW, serving up search results from a few Commons-specific namespaces. Keegan (WMF) (talk) 17:40, 24 February 2021 (UTC)[reply]
Thanks for the clarifications @CBogen (WMF) and Keegan (WMF): The rationale given to not send people to Dog made sense to me at the time, and still does to a degree. However, I had not fully groked that this would mean that searching for “Lovell Telescope” would not send me straight anymore to Category:Lovell Telescope − which I think would still be a desirable behaviour. Jean-Fred (talk) 11:01, 25 February 2021 (UTC)[reply]
@Jean-Frédéric: the only thing that was removed was search results redirecting to main namespace pages; so if Lovell Telescope existed on Commons then search would have taken you there instead of the search results page. We didn't touch category pages. I'm not familiar with a time that a search result would take me directly to a category page, I can't recall having had that experience personally with my (mostly) vanilla Commons preferences. Keegan (WMF) (talk) 21:00, 25 February 2021 (UTC)[reply]
@Jean-Frédéric: I had a similar memory to @Keegan (WMF): it worked for galleries but not very well for categories (which was a problem with the old system). You could (and in fact, still can) search for "Category:Lovell Telescope" and find the category that way - but the new search returns images by default, even with 'Category' in the search term, which isn't so good. Perhaps it will still display the category link in the pop-down menu in that case, though? Thanks. Mike Peel (talk) 08:36, 26 February 2021 (UTC)[reply]
  • @Keegan (WMF): That makes sense, but it makes the (unintentional?) impression that Commons is for images, rather than for all media files. I think it's *good* that we say "we have PDFs, we have audio files, we have other media related to this search as well" rather than just "here are the images, click on the links above and we may or may not have other files". I'd definitely like to see categories highlighted more (appearing in the 11th place isn't great), but this goes the other way and completely hides them from sight from most search users. That may be a good thing in the long term - it would be great if search and depicts could replace categories entirely - but we're not there yet.
    I'd much prefer the basic search returns a mix of contents, with obvious links to just show images etc., but that goes against your aims I guess. Failing that, could you put a number next to the links to the other options, to clearly demonstrate that they have relevant results as well. And I would really like to see a prominent note saying that there is a relevant category (and if available, gallery) for the search term, so that it's more obvious that they are available. At least using exact name matching, but ideally displaying the top category in a search result so that it benefits from multilingual support in the infobox metadata. Thanks. Mike Peel (talk) 19:12, 25 February 2021 (UTC)[reply]
Thanks for this perspective. We'll keep an eye on the MediaSearch metrics to ensure that we continue to see improvements in the ability for users to find what they're looking for. Meanwhile, I've filed phab:T275900 to track the request to show whether the other tabs have results relevant to the search query. CBogen (WMF) (talk) 19:15, 26 February 2021 (UTC)[reply]

When we are to know its effectiveness in other languages, we need metrics that show the use of Commons based on the language used in search criteria. Thanks, GerardM (talk) 05:43, 23 February 2021 (UTC)[reply]

Commons is an English language website. Its search engine to be will support all our languages. Images are needed in any and all of our languages. Without a plan this aspect of the search engine will either be an item on a tick list or of profound importance to all our projects. With proper attention we will get more pictures found in Commons itself in stead of being copied from articles in other languages. We will have students from all over the world looking for images and freely licensed ones at that.

Release announcement posted

[edit]

I've provided information over at the Village pump. Keegan (WMF) (talk) 20:37, 23 February 2021 (UTC)[reply]

@Keegan (WMF): "fast approaching" and "this will be made live next month" are quite different things, one implies that the process is still open for iteration, the other gives a deadline by which it has to be acceptable. The first approach is much better. Thanks. Mike Peel (talk) 19:16, 25 February 2021 (UTC)[reply]
@Mike Peel: understandable observation. However, the two are not mutually exclusive. The team is at a point where they believe it's ready to serve as the default landing for search. They also believe that they can continue to iterate and make improvements as needed before, during, and after the software launches. I hope that clarifies things. Keegan (WMF) (talk) 19:38, 25 February 2021 (UTC)[reply]
@Keegan (WMF): That makes sense, but in general, 'believe' is often not the same as reality. Please don't feel afraid to say 'it's not ready yet, let's wait a bit longer' or 'we want to implement this first' rather than targeting deadlines. Thanks. Mike Peel (talk) 19:44, 25 February 2021 (UTC)[reply]

Media size

[edit]

The media sizes (All image sizes, Small, Medium, Large) don't appear to be very useful. I select Large and it found images of only around 1.6MP. I'm guessing it was just looking for images with any dimension > 1000. Compare with Google's Advanced Image Search which in addition has MP options going up to >70MP. What are the use-cases you considered for size? Given that it is possible to downsize or crop a larger image, the main case for small sizes would I think to focus on those looking for icons in PNG or GIF format (which is exactly what Google calls its smallest size). I suggest you consider some options for larger image JPGs.

  • HDTV (1920×1080)
  • Ultra HDTV (3840 × 2160) (aka 4k)
  • > 5MP (about the resolution you need to print adequate quality on A4/US letter/magazine page)
  • > 10MP (high quality printing)
  • > 20MP (high resolution image)

At the moment, the highest setting, Large, is so low it doesn't filter Commons JPGs to any useful degree. PNG or GIFs are a different matter, if they are mainly used for icons or web art. Video files likely deserve their own size options similar to (standard dev, HDTV, Ultra HDTV) - perhaps similar to what YouTube offers for viewing choices. -- Colin (talk) 11:15, 24 February 2021 (UTC)[reply]

Thanks for this feedback, @Colin: currently, as you suggested, the image size filter categories correspond to pixel size. Small is < 500px, Medium is 500-1000px, and Large is >1000px. We modeled these categories after the categories in Google's Image Search (though not the Advanced version). We're open to making changes here and I'd love to hear from more folks about what image size filter categories would be useful to them. CBogen (WMF) (talk) 20:03, 24 February 2021 (UTC)[reply]
Megapixel is a better metric than number of pixels on a side as it copes better with panoramas. I'd suggest something like small being <3Mpix, medium 3-10Mpix, large >10Mpix. Most photos by current generation cameras (and previous generation good cameras) would fall under 'large', so hopefully there would still be a reasonable number of results for all options. Thanks. Mike Peel (talk) 19:22, 25 February 2021 (UTC)[reply]

Faceted search for location of creation

[edit]

I filed phab:T275787. Multichill (talk) 16:26, 25 February 2021 (UTC)[reply]

May need to de-prioritise usernames in searches

[edit]

For example "hilsea" mostly produces decent results but also pulls in a couple of rather random images due to author names.Geni (talk) 17:40, 26 February 2021 (UTC)[reply]

Where is the redlink?

[edit]

I entered a category name in the search box and clicked "Go" (not "Search", which I also have; one main reason for me to use Monobook is that it gives me both) to create the new category. I end up in Media search, I am told "We didn't find any results – Try more general terms or check your spelling". No redlink that I can see, no obvious way to create the category without either going on to Special:Search or start editing the URL.

How is this supposed to work?

LPfi (talk) 13:14, 15 May 2021 (UTC)[reply]

@LPfi: You are correct, Special:MediaSearch doesn't display redlinks for searches that come back empty. If you need this behavior from Special:Search, there is a link in the right-hand corner of page to switch back to the traditional search. There is also an option in user preferences that you can check if you'd prefer to always use Special:Search from the site-wide searchboxes. EGardner (WMF) (talk) 21:29, 27 May 2021 (UTC)[reply]
So only seasoned users well-versed in Mediawiki wikis can create categories, and others should not be advised to do it, unless given detailed instructions any time such advice is given. This probably affects galleries in the same way.
Was there a deliberate decision to limit category and gallery creation to seasoned users? Was it discussed at Commons?
LPfi (talk) 08:19, 28 May 2021 (UTC)[reply]

And now I tried to use the shortcut Deletion requests. It exists as a redirect and turns up in the search hit list at the search box, but I am thrown to the media search whether I click the redirect listed or type in the redirect name. Of course the Commons page doesn't snow up in the search results, as the search is for media. If I choose "Categories and Pages" I get an invalid search error.

Does this mean that to find policy pages you have to know you have to prepend "Commons:"? So are all redirects from the main namespace are now unusable for people using the recommended search tool, and very confusing: first a page is found, but when I click the name it is no longer there. Should they all be deleted?

How are ordinary users supposed to search for policy pages? It works if you prefix your search with "Commons:". To change the namespace you have to click "custom" to get to see the grayed-out options, clearly an "advanced users only" list, a very long one with tiny font to underline the impression. The namespace is called "Project", a name never used here, so if you are searching for "Commons" (perhaps you misspelled it in your original search) you won't find it. With namespace: all, the Commons namespace is seemingly given low priority

LPfi (talk) 15:03, 7 June 2021 (UTC)[reply]

Thanks for the feedback. We are looking into this and you can track it in T285168. CBogen (WMF) (talk) 19:50, 22 June 2021 (UTC)[reply]
Thank you. –LPfi (talk) 14:00, 23 June 2021 (UTC)[reply]

Searching for a media ID that's part of a URL and/or template parameter

[edit]

Copy from the main village pump:

When I search for "ggbain.34249" here at Commons, I only come up with the image I just loaded. But looking through the category, the image already existed as File:A. Santos Dumont LCCN2014714401.jpg with "ggbain.34249" in the text. Why didn't I find it? Is my search only returning the string when found in the title? --RAN (talk) 18:21, 22 September 2021 (UTC)[reply]

Mediasearch indeed only finds File:Alberto Santos-Dumont circa 1910 by H. Manny via Bain (ggbain.34249).jpg, which has the exact ID ggbain.34249 in the file name. It should probably also find:

This seems important for avoiding duplicated uploads from archives. Cheers, --El Grafo (talk) 09:10, 23 September 2021 (UTC)[reply]

Image search - mixed results

[edit]

I searched for images using text "GWR 3031" [2] and it returned 16 images of steam engines (as expected, and all appropriate), eight pictures of flowers, and half a moth. I can find no reason why the flower/moth images have any relation to "gwr 3031".--Verbarson (talk) 19:01, 28 October 2021 (UTC)[reply]

Difficult to find my own uploads

[edit]

@Keegan (WMF)@CBogen (WMF) Again and again I have difficulties to find my own uploads with MediaSearch. I am really worried, how other persons can find a file, who - unlike me - do not know that a specific file is available from commons. As an example: Searching for "Demo das nordische modell vor dem hurentag" does find "Demo für das Nordische Modell vor dem Hurentag Berlin 2021 1.jpg" as expected. But it is one of the last results. The first and majority of results are images of the Hurentag-Demo. So MediaSearch returns just the opposite of what I was searching for (images of the pro demo instead of the anti demo), while the wanted results ("Demo für das Nordische Modell vor dem Hurentag Berlin 2021 1.jpg") are a near complete match of the search string ("Demo das Nordische Modell vor dem Hurentag"). Other cases are even worse: An image is not found at all, while the search term is actually part of the file name. (Same with the search term being part of the file description of a file that is not found by MediaSearch). C.Suthorn (talk) 07:43, 24 November 2021 (UTC)[reply]

Maybe this is not such a good example: The names of all files found by that query match your search string equally well. How is the search function supposed to know whether you mean "Demo gegen das …" or "Demo für das …" when you type "Demo das …"? This seems to be more of a sorting problem. But in general, I agree: there are a bunch of things that should be found but are not. Strings on file description pages being one of them (see above). --El Grafo (talk) 13:38, 24 November 2021 (UTC)[reply]
Thanks for reporting this issue. As @El Grafo mentioned, this specific example is working as intended. However, if you can share some other examples of strings that aren't being found, I'd be happy to look into it. CBogen (WMF) (talk) 14:19, 24 November 2021 (UTC)[reply]
File:Birkenstraße 62 Eingang von Haus N hinter rot-weißer Schranke in Berlin Moabit, Gewaltschutzambulanz der Charité, forensische Dokumentation körperlicher Gewalt auch ohne Anzeige bei der Polizei. 030 450 570 270, erreichbar U-Bahn U9.jpg
  • + Birkenstraße 62 Eingang von Haus N hinter rot-weißer Schranke in Berlin Moabit, Gewaltschutzambulanz der Charité, forensische Dokumentation körperlicher Gewalt auch ohne Anzeige bei der Polizei. 030 450 570 270, erreichbar U-Bahn U9
  • - Eingang von Haus N
  • + Birkenstraße Eingang
  • - Gewaltschutzambulanz [!]
  • + hinter rot-weißer Schranke
  • - hinter Schranke
  • + 030 450 570 270
  • + erreichbar U-Bahn U9
  • - erreichbar U9
  • - körperlicher Gewalt [!]
  • - körperliche Gewalt
  • - koerperliche Gewalt
  • - koerperlicher Gewalt
  • + auch ohne Anzeige bei der Polizei
  • + auch ohne Anzeige
  • - ohne Anzeige
  • + ohne Anzeige Polizei
  • - ohne Polizei
  • - forensiche Dokumentation [!]
  • - Charité Polizei
  • - Charite Polizei [!]
  • - Charité Dokumentation
  • - Charité forensiche
  • - Charité Anzeige [!]
  • - Charité Gewalt
  • + Charité ohne Anzeige
  • + Moabit Polizei
  • ! Berlin Moabit Polizei [result image stretched - not usabable]
  • ! Haus N [too many results]
  • ! Haus N Moabit[too many results]
  • - Haus N Charité
  • - Haus N Gewalt
  • - Haus N Anzeige [!]
  • + Haus N ohne Anzeige
  • + Haus N U-Bahn
  • ! Haus N U9 [too many results]
--C.Suthorn (talk) 18:55, 24 November 2021 (UTC)[reply]
@C.Suthorn: Thanks for the many examples! This is indeed a regression & we already have a fix merged - it should be deployed next week. I've run through most of your examples and have confirmed that after this fix, that image will indeed start to show up for those searches. We're tracking the bug here: https://phabricator.wikimedia.org/T294953 -- Mmullie (WMF) (talk) 09:39, 25 November 2021 (UTC)[reply]
I tried 3 examples just now and found no difference ("ohne Polizei" and "Gewaltschutzambulanz" do not work, "auch ohne Polizei" works. As before). --C.Suthorn (talk) 17:44, 8 December 2021 (UTC)[reply]
@C.Suthorn: I don't believe the fixes have made it to production yet. Please check again later this week, thanks! CBogen (WMF) (talk) 19:15, 8 December 2021 (UTC)[reply]
Still does not work. Also: Image is not found with "zludhxhnmkn" as search term (term is part of description, meta-data, uploaded file, upload comment and does not appear anywhere else in commons). --C.Suthorn (talk) 09:49, 13 December 2021 (UTC)[reply]
@C.Suthorn It seems that last week's deploy was rolled back because of some production errors, so while the fix should have been in place yesterday it is not in place today. I'll post again when it's been re-deployed. CBogen (WMF) (talk) 13:42, 13 December 2021 (UTC)[reply]
@C.Suthorn The fixes are once again in production. A quick check of some of your search terms looks like it's working properly now, but please take a look and let me know. Thanks! CBogen (WMF) (talk) 19:20, 13 December 2021 (UTC)[reply]
It is significantly better now. But still: It is irritating that "Gewaltschutzambulanz" works but "gewaltschutz" does not, "rot-weißer Schranke" works "rot-weiße Schranke" does not, "U-bahn u9 forensisch" works not. C.Suthorn (talk) 21:09, 13 December 2021 (UTC)[reply]

Search by structured data?

[edit]

I would like to search for images by structured data fields. E.g. find all images that "depicts female organism". Is this possible in MediaSearch or by any other means? I've not been able to figure this out either by looking in the Help centre or by trial and error. Any help would be appreciated. --Tagooty (talk) 08:12, 8 February 2022 (UTC)[reply]

Hi! You can do this with the syntax haswbstatement:P180=Q146. Hope that helps! CBogen (WMF) (talk) 14:35, 8 February 2022 (UTC)[reply]
Thanks, this is useful! Is there an easy way to find out the options for Pxxx and Qyyy, other than opening Wikidata? Is there a way to search for "female organism" instead of Q43445? --Tagooty (talk) 15:30, 8 February 2022 (UTC)[reply]
Sorry, not that I'm aware of. As far as I know, in order to use this syntax you need to use Pxxx and Qyyy, and the only way to find the options right now is to search Wikidata. CBogen (WMF) (talk) 15:32, 8 February 2022 (UTC)[reply]
Ok, thank you --Tagooty (talk) 15:35, 8 February 2022 (UTC)[reply]

Images only found with search using Q-number

[edit]

I was searching for files with the statement located in protected area (P3018) Boddenküste am Strelasund (Q61452390) on the files. When I search for the name "Boddenküste am Strelasund" no files are found. If I search for "Q61452390" the files are found as expected. It looks like there is an issue with the link from the name to the item. --GPSLeo (talk) 13:19, 25 April 2022 (UTC)[reply]

Clothing

[edit]

right now if you search "clothing" File:Clothing&Accessories.jpg is the first one showing up. how does your algorithm work?? RZuo (talk) 13:46, 5 May 2022 (UTC)[reply]

I do not have the slightest idea about how the search algorithm works. But: The filename includes word "clothing", the information box includes 2 times the word "clothing", its caption includes the word "clothing", the category includes the word "clothing". It indeed depicts clothing. It has a structured data statement saying it depicts clothing and even it has the word "clothing" written in the very image. IMHO it's pretty logical this image being delivered as a result of the search "clothing". Maybe quality images (featured pictures and so on) depicting "clothing" could rank first. Do they exist? Strakhov (talk) 08:51, 7 May 2022 (UTC)[reply]
@Strakhov: thx for your idea!
i just did some tests. it seems mediasearch puts quite a lot of weight on structured data statements.
i just renamed File:Clothing with black and white stripes worn by Azul, clothing model (33396265845).jpg using twice the word clothing, spammed its caption and wikitext with the word, and made depicts=clothing. now it's no.7 in mediasearch of "clothing".
File:Wolford fatal dress as a mini dress to cover the see trough part of a too thin jeggings.jpg is also worth noting. it's now no.8 of the search results. the only place it has the word clothing is its depicts statement. RZuo (talk) 10:01, 7 May 2022 (UTC)[reply]
by adding depicts=clothing to File:Chiang Mai, Thailand, Colorful clothing.jpg, it now becomes no.16 of the results. RZuo (talk) 10:06, 7 May 2022 (UTC)[reply]
Wolford fatal dress as a mini dress to cover the see trough part of a too thin jeggings.jpg is a quality image, that's the reason it ranks so high. Strakhov (talk) 10:21, 7 May 2022 (UTC)[reply]

Date and recency (and feedback options)

[edit]

Hello, I was searching for images by date and then I realised... there are two dates, creation and upload... And I now I wanted to order the search results according to both of them...

Currently you have an option to order by "recency". Recency orders by upload date.

Then when you click an image you get a limited preview. It shows creation date... I'd like, sometimes, both those dates in the preview as well as to order the search by either date.

Also when I realised I had feedback for the new search, I looked around and didn't spot a feedback button. So I clicked "help" near the top left of the search result area (as have others). "Help" takes you to the MediaWiki dev area. That is not where they want this feedback... This commons page is for the feedback, right?

I suggest adding a feedback button beside the help button, instead of sending people off site and back and stuff. Thanks, ~ R.T.G 01:29, 17 June 2022 (UTC)[reply]

+1 --C.Suthorn (talk) 16:12, 3 September 2022 (UTC)[reply]
For a date filter see phab:T329961. Prototyperspective (talk) 23:34, 15 October 2024 (UTC)[reply]

Bad search result, and some sorting suggestions (used in Wikistories)

[edit]

So, as some of you might know, WikiStories (project Inuka) would be heavily relying on MediaSearch as the user try to find illustrations for their stories. Here's my first try on the tool, and already resulted in a problematic bug from the MediaSearch search result. (https://id.m.wikipedia.org/wiki/Istimewa:StoryBuilder/Story:Kota_Surakarta -- need to log in and activate WikiStories in your Beta preferences in Indonesian Wikipedia). Try to add an image of the city, and the default result was a mess. I've submitted ticket phab:T311834 on how to reproduce this error in Commons. I will copy the report and suggestion below:

List of steps to reproduce (step by step, including full links if applicable)
What happens?
  • Some of the search results include:

None of them have anything to do with Surakarta City, and nowhere in the title, description, meta, matches any resemblance with "Surakarta" or "City" ("Kota", in Indonesian).

I happened to found this bug when I was creating a Wikistory in Indonesian Wikipedia:

https://www.mediawiki.org/wiki/Topic:Wyh8f0obblz9wdd9

I suspect this has to do that en:Surakarta is colloquially known as "Solo" for short. But where does this information seeped into the search result?

What should have happened instead?
  • Display results related to the city, e.g.
  • Nothing unrelated to "Kota Surakarta" should be displayed
  • Images with titles, description, category (and subcat), matching the search term should have greater weight, and displayed on the top, therefore, images with neither title, description, category (and subcat), matching the search term should be pushed to the very last of search result.
  • Ideally, images should be sorted by usages in projects. Greater usages = better quality image (hopefully). Other consideration for extra weight would be: Picture of the Day status, title matching exactly the search term, how old is the file, multiple occurences of the search terms in the title and description and categories, whether the image is in the top category or way deep in the subcategories.
  • Images in the category with Exact match as the search term should be displayed


OK, so I just found out from the mw:Help:MediaSearch#Statements_and_structured_data that the search result is using the Wikidata aliases.

> This has the potential of drastically expanding the amount of results returned, because entities already cover synonyms (via Wikidata aliases) and language differences (via labels & aliases in multiple languages): a file only needs to be tagged with one depicts statement per item, and search will be able to find that statement and any of its aliases or translations. "

So I think this should be the problem with sorting, and weight of each result. Maybe the search term should be given more weight, and the aliases (much) less weight? Maybe unused files should not be given in the top 10 (or top 100) result?

Furthermore,

> Note: not all entities are considered equally in search ranking. When searching for "iris", users are likely expecting to find multimedia that depicts the genus of plants (Q156901), or maybe the part of an eye (Q178748), but probably not Iris Murdoch, the British writer and philosopher (Q217495).

> Based on the similarity to the search term and the importance/popularity of the entity, Media Search will boost multimedia with certain entities more than others.

I didn't see this being the case, as we can see, the search result gave several unused files, and several people with the same name as the alias ("Solo"), and less of the main topic itself.

Bennylin (yes?) 12:48, 1 July 2022 (UTC)[reply]

Hello @Bennylin, thanks for raising this issue. We do media searches search based on (in descending order of how we weight each search field):
  • any "depicts" statements an image has
  • the image title
  • the image category name (not subcategories)
  • the image captions in the user's interface language
  • if there's a redirect to the image, then the title of the redirect
  • the text on the page
We search for the search terms the user has entered, plus any synonyms we can find, giving the synonyms half the weight of the user-entered search terms. Note that the scores for individual search terms depend not only on the weights we've given to the fields, but also on how often a search term appears within the fields we're searching, and how often it appears anywhere on Commons (see en:Okapi BM25 if you want to find out more).
Then once we have all the results, we rescore the top ~8k results, giving an extra boost to any files with templates like {{Quality image}}, {{Valued image}}, etc. The weights for the search terms are a result of optimising for "good" results coming before "bad" results from a training dataset of 14k search results in 20 languages.
Just going through some search results for "kota surakarta":
  • The first image matches on "depicts" (see in the "structured data" tab), title, category, caption (because the user's interface language for this search is English), and page text
  • The second image matches title, category name, and the title of a redirect to the file
  • The third image matches title, category name, and page text
  • The fourth image matches the search term and a synonym in the title, search term and a synonym in category names, and the page text
  • The first obviously wrong image we can see is File:Solar Solo.jpg - this matches a synonym ("solo") on the title, category and redirect
  • Similarly the chainsaw image File:Solo-645.jpg matches a synonym on title and category
Adding synonym searching has massively improved our search "recall" (the amount of results returned) for non-English languages, especially less widely-spoken ones - before we introduced it when you searched for (for example) "íaltog" you got very few results, but now the search system knows that "íaltog" is the Irish for "bat", and so can return images with "bat" in the title, or in the category "bats". It may be that synonym searching has reduced our "precision" (meaning the proportion of the total results that are good matches), and the data we have used to build our model is insufficient to show that up. We can try and gather more data, and experiment with different weights for synonyms - any more examples like this that you might have will be very helpful, but please keep in mind that we're balancing making things better for some use cases against making them worse for others.
In the meantime, we also have recently imported a large new dataset based on links between Wikidata and Commons images that we hope will improve our precision. Mind that it won't necessarily exclude the "bad" images you have found, but it should ensure that better images appear sooner in the list of results. We're hoping to start using this new dataset in Media Search within the next couple of weeks.
P.S. We haven't found that whether an image is used on-wiki is a good signal for scoring how good a match an image is. A good illustration of why it hasn't turned out to be useful as a primary search signal is this file - it's used in lots of pages, including the Albert Einstein and Arthur Eddington pages on many wikis, but is not a good match for a search for "Albert Einstein" or "Arthur Eddington". It might turn out to be useful as a secondary signal in the same way that the templates are, but we haven't evaluated that so far. Sannita (WMF) (talk) 17:42, 1 July 2022 (UTC)[reply]
@Sannita (WMF) would it be possible to make the weighting of synonyms somehow adjustable for the user? I don't think a one size fits all setting is realistic. For example, searching for the German term "Hund" (= dog), I get a lot of images of people named "Hund", but a very low number of dog images ... --El Grafo (talk) 13:54, 4 July 2022 (UTC)[reply]

image of rosenmontagswagen in mainz gets found with search term "rosenmontagswagen mainz", but no search results at all with "rosenmontag mainz"

[edit]

image of rosenmontagswagen in mainz gets found with search term "rosenmontagswagen mainz", but no search results at all with "rosenmontag mainz" C.Suthorn (talk) 13:44, 31 July 2022 (UTC)[reply]

history

[edit]
  1. do a media search
  2. look at found media
  3. do another media search
  4. look at found media from second search
  5. use browser history to go back to result page of first search.
  6. not the results of the first search are shown, but the results from the second search
  7. bäh C.Suthorn (talk) 13:47, 31 July 2022 (UTC)[reply]
    Hi there! Would you be willing to file a bug report for this behavior using this link? EGardner (WMF) (talk) 21:40, 28 November 2022 (UTC)[reply]

Yesterday in Münster

[edit]

I uploaded images of a vigil that took place yesterday in Münster. But before that, i searched fot other images that might already have been uploaded. This potential other images may come with any name ("flickr (123).jpg"), with or without categories, with or without SDC. But they would be very very likely to mention Münster (the place of the crime and the vigil) and the date (2022-09-02). But there seems no way to mediasearch for such images. The first result for "Münster" is "Munster" - a completely different town. the other results are subcats of Münster, but not Münster itself!. Also there seems no way to find images that depict something at a specific time or timeframe.

What I would have needed was a search that finds everything that is in Münster (identified by the name or the coordninates in the text, or filename, or cats, or SDC, or by any other property) and nothing that is not located in Münster and that also is on 2-9-22 (has 2022-09-02 or 2022-09 or 2022, but not 2022-08 or 2020-09-02 in date field of info template or in cats or in SDC or whereever) and nothing that mentions "2022" and "02" and "09" or "september" but without relating to the time of the event the foto was taken at.

What I got in results was images from Munster and from cities near Münster and from events in 2020 that were uploaded in 2022 and so on. C.Suthorn (talk) 16:24, 3 September 2022 (UTC)[reply]

Search vs MediaSearch

[edit]

as explained here there is a different behaviour using slash after the Special page (Special:Search/Building vs Special:MediaSearch/Building). can this be fixed? thanks. --valepert (talk) 10:33, 17 September 2022 (UTC)[reply]

How language agnostic search is done?

[edit]

Do you know if there is any technical description or presentation on how Wikimedia Commons language independent search is done? (ie. what it is in search index, what is machine translated if any etc, how structured data is used? Relevant pointers to Phabricator etc are also ok) (ping @CBogen (WMF) @Keegan (WMF)) -- Zache (talk) 09:40, 19 November 2022 (UTC)[reply]

Hi @Zache: all existing documentation is here. There is some information missing, and we have an open ticket to update the page. I hope that helps! CBogen (WMF) (talk) 19:14, 8 December 2022 (UTC)[reply]

Add a new category to a lot of images altogether

[edit]

How to add a new category to a lot of images altogether ? For example I get 117 images typing the description "Fables de J. de La Fontaine, illustrées de 120 gravures par J. Désandré et W.-H. Freeman" and I would like to add at once the category "Illustrations of Jean de La Fontaine's fables by J. Desandré and W.-H. Freeman". Thanks in advance. Cquoi (talk) 10:46, 5 December 2022 (UTC)[reply]

By using cat-a-lot. Currently, it only works in the SpecialSearch but it seems like there's a chance it could work with MediaSearch as well soon. Check its preferences after enabling it. Prototyperspective (talk) 23:31, 15 October 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Prototyperspective (talk) 23:31, 15 October 2024 (UTC)

mediasearch dyfunctional (copied from VP)

[edit]

Here is the thing: On https://article.wn.com/view/2021/01/27/factbox_the_brexit_impact_so_far_8211_paperwork_process_and_/ is a reuse of one of my fotos. It is attributed with " Protests against Brexit at the Brandenburg gate in Berlin, Germany on 7 September 2019." So I thought, if I enter "2019-09-07 C.Suthorn" in MediaSearch I will find all the fotos I made on that day and published at commons (this specific image has my name and the date in the descrtiption page, in SDC, in the category and in the connected "depictrs" wikidata entry. So the MediaSearch has a number of ways to make the connection between the query and the image i was looking for). But actually MediaSearch returned a large number of images by me and many of them about brexit. But it did not only not return the image i was looking for but none of the images from 2019-09-07 i published at Commons. It is a search result of images not from that date! Whats wrong? C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 09:59, 18 January 2024 (UTC)[reply]

Ping @Keegan (WMF) @CBogen (WMF) C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 14:52, 1 February 2024 (UTC)[reply]
@C.Suthorn Thanks for reporting this bug. Would you be so kind to open a bug on Phabricator and explain step-by-step what happened? I'll make sure it gets on devs radar. Sannita (WMF) (talk) 11:27, 2 February 2024 (UTC)[reply]
@Sannita (WMF) phab-ricated. C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 22:27, 3 February 2024 (UTC)[reply]
@C.Suthorn Thanks a lot! --Sannita (WMF) (talk) 16:56, 4 February 2024 (UTC)[reply]

"safe room"

[edit]

I'm curious why this search returns lots and lots of images of greenery and such, which, when I select their pages contain neither "safe" nor "room", let alone "safe room" as a phrase. And then, when I remove the "&title=Special:MediaSearch&go=Go&type=image" portion, i.e. this search, suddenly the images are on-topic. In other words, going to commons.wikimedia.org, and then using the search in the top right to find "safe room", appears to mess up the search because of "title=Special:MediaSearch"... --Talky Muser (talk) 19:07, 29 July 2024 (UTC)[reply]

I think what is happening, is that a translation for citadel (Q88291) is being surfaced as safe room (Q1129454) or vis-versa, let me see if I can untangle that a bit in the search results @Talky Muser, Sadads (talk) 12:21, 8 October 2024 (UTC)[reply]

Request: page move

[edit]

Please move the page to Commons:Media search or Commons:Structured data team/Media search. The current title is misleading. It also discriminates other contributions the media search draws upon to show results aside of structured data. In fact, SD is probably least impactful in terms of the quality and inner workings of the search engine – the file title & description & categories seem more important and also are far more prevalent, detailed, extensive and well maintained than SD (but that is not the issue here). At the very top it already states MediaSearch uses categories, structured data and wikitext so clearly structured data is only one part of it and having this in the page title is unwarranted. I don't think the name of the team who developed this great component (thank you for that btw!) should be in the page title either. Prototyperspective (talk) 21:44, 7 October 2024 (UTC)[reply]

I am supportive of a broader title, (i.e. Media Search), since the feature is so widely available to folks, Sadads (talk) 12:17, 8 October 2024 (UTC)[reply]

File is not found by MediaSearch

[edit]

I was trying to find the file File:"Unteilbar" 047 (cropped).jpg. I entered the search term "unteilbar 47 cropped". But the search does not find any file. Not even the similar file file:"Unteilbar" 047.jpg. All three words in the search term are actually verbatim parts of the file name. The file is in a cropped category, the file is in the untailbar category. The file has a depict statement with the unteilbar property. And still nothng is found. C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 09:29, 12 October 2024 (UTC)[reply]

Suggestions for improvement / issues

[edit]

Can you please improve the MediaSearch so it:

  1. uses file uses (especially in search-term relevant articles) as a main indicator for quality to show files further up in the results (note: Wikipedia uses would weigh more than e.g. uses on Wikiversity and good-quality main WPs like EN and ES WP would weigh more than uses in very small WPs) – the screenshot shows only few files of relevant Wikipedia articles like Animal while lots of tangential and/or unused ones
  2. uses the date field and Category:Maps by year and Category:Charts by year of latest data to show more up-to-date files further up
  3. has a date range filter (sort by recency only shows recently uploaded and one may want to search for images during a specific time) – Wishlist proposal & see point 9 below
  4. moves files in Category:File accuracy disputes and Category:Information graphics without data source further down (at least if not used on a large WP)
  5. moves things like book scans (page-per-page) further down (see the screenshot; a discussion here)
  6. (various further quality indicators like whether the file has been motd,featured picture,etc)
  7. shows a category hint/suggestion at the top when there is a category for what the user searched for (similar to how a hint is shown when on a singular category where a category of the plural with an extra s exists)
  8. under "Categories and Pages" on WMC it doesn't only show very many galleries (which are usually far less useful and in >95% of cases hopelessly incomplete & unmaintained) and but also the category at or near the top
    • The category page is often buried underneath all these galleries even when the search term matches the cat name 1:1 – for example when searching for roses it shows many galleries but not the category. I only use that page by clicking on Namespaces->Custom->unchecking Gallery so it's a category search but doing so (knowing about this) shouldn't be expected from users of the site. One could implement this for example via reserving the top few spots in the results of that page for categories and e.g. Help: & Commons: pages.
  9. one can search within specific fields of the file Information template (this is probably tied to changes in other components) – one can already search them via insource, e.g. insource:"|source=[https://soundcloud.com to identify files for this cat or this cat and this point could maybe be implemented via developing some regex that searches for any content in the field of e.g. |source= and then creating some alias for it so instead of writing some complex regex query every time one can simply enter e.g. info-source:"search phrase for field" or info-date:"01.1990" (e.g. to populate these cats)
  10. (often uses the matching category for what has been searched for if there is a category for has been searched for and sort the files so that it shows many different images (in different subcats) even if these files don't have the search term in their description and the search term is not in the category title)
  11. it is more aware of synonyms – the Help page says Similarly, when searching for a bat in text-based search, search will not find images where they're referred to by their scientific name: Chiroptera but searching for bats should also show items with Chiroptera in description or in the cats because the synonymity could be via Wikidata item aliases and/or via category redirects

Note: I will edit the above list of issues to add links. Please comment if something is unclear / you have questions or if you know things that could help implement these things. Sorry if this list is a bit overwhelmingly long, I think these are the missing pieces to make the Wikimedia search truly great and the search is a key component of this large platform and one of two main ways (next to category browsing) that its users use it. Prototyperspective (talk) 00:02, 17 October 2024 (UTC)[reply]

Unrelated Commons search results

[edit]
MediaSearch for "our world in data"
In this search (sorted by recency) lots of unrelated files are shown which I highlighted in the screenshot on the right.
  • For some, the files contain words like "our" and "world" – I think MediaSearch should probably detect when the search phrase matches a category and then use the category more to display mostly (often only) results within that category. I don't know how much it would or should affect files shown when sorted by "Recency" but think a few of the highlighted files should probably be excluded that way (probably not the file in the bottom left which has this text in its Information template: World Data Base II data).
  • Many of those files only have e.g. "Butterflies and Moths of the World" set as depicts statement. That one of multiple words of a phrase searched for matches one of multiple words of a depicts statement shouldn't make MediaSearch show these files.
--Prototyperspective (talk) 12:08, 23 October 2024 (UTC)[reply]