Jump to content

Wikidata:Tools/Wikidata Topic Curator

From Wikidata

Wikidata Topic Curator is a rewrite of ItemSubjector into a React webapp to help wikimedians add relevant topics to items.

Features

[edit]

Based on a given topic QID it fetches articles matching the label, aliases or a custom user-provided term of that QID that is currently missing the main subject property.

  • Multi-term support
  • Populating terms from label and aliases
  • User defined terms
  • Excluding items with a certain term (via CirrusSearch affix, see below)
  • Support batch upload by sending the matches to QuickStatements
  • Support for multiple languages
  • Support for any subgraph (via custom prefix)
  • Support for nudging users to match subtopics BEFORE parent topics. (E.g. "domestic violence" before "violence" and to exclude matches that has a subtopic of the current topic when matching the parent topic. In this example an article matched with "domestic violence" already will be excluded when using the tool to match articles to a parent topic like "violence".)
  • 3 predefined subgraphs that can be selected:
  • Scientific articles
  • Journals
  • Riksdagen Documents

Suggested workflow

[edit]
  • find a topic of interest in Wikidata/Wikipedia
  • send it to the tool using the user script from Wikidata (see below)
  • choose the terms you want to include (recommended: avoid terms with less than 5 chars to avoid false positives - this is now the default)
  • inspect the titles of the matched articles to make sure you have a sufficiently narrow topic
  • inspect a few of the matches more closely (go the item/full resource) to make sure they make sense
  • check all the items you want to match
  • log in to QuickStatements in a new tab
  • submit to QuickStatements
  • run the batch in QuickStatements

Excluding terms

[edit]

Sometimes you want to exclude certain words from the labels of items. E.g. when matching on "parental alienation" you don't want "parental alienation syndrome" to appear. That can easily be archived by entering "-inlabel:syndrome" as affix on the /terms page or adding &affix=-inlabel:syndrome to the url.

User script

[edit]

Consider using User:So9q/wikidata-topic-curator-link.js to send items to the start page conveniently using the link in the Toolbox. The tool will automatically fetch and populate the terms based on label and aliases of the item upon load.

Synia support

[edit]

Synia topics now include links to this tool.

Source code

[edit]

Impact

[edit]

2025-03-18

[edit]

As of 2025-03-18 there are 18,365,532 P921 statements on items that are inferred from the title. source (WDQS query that times out)

26,471,737 articles are missing P921 out of 45,094,308 total (=58,7%)

2024-03-07

[edit]

The React rewrite was no deployed in Toolforge. It is way faster than the old Python Flask app. :D

2024-02-18 week 3

[edit]
  • Scientific articles: 24 079 010/41 447 254. Items missing at least one topic: 58,10%
  • Riksdagen documents: 183 820/263 831. Items missing at least one topic: 69,67%

The tool is now hosted in a private VPS in Germany.

2024-02-01 week 1

[edit]

Down to 24 153 666. Difference: -~40k

The tool has not yet been launched in a way that enables batches larger than 30 items because of limitations in Toolforge.

2024-01-24 baseline

[edit]

The number of scholarly articles missing any subject is 24 194 625. This is a few million items more are needing a topic since I quit using ItemSubjector and recommended others to do the same in 2022.

The matching of journals to topics has only changed by a few hundred to 85 080 since I worked on it with ItemSubjector. We really need that to be zero so we can match better by selecting subset of articles. E.g. when working on a physics topic we can exclude al articles not linked to a physics journal.