ORKG-NLP Services

Supported Services

ORKG Service	Version*	Huggingface Repository	Description
`predicates-clustering`	v0.2.0	orkg/orkgnlp-predicates-clustering	Recommendation service for ORKG predicates based on clustering.
`bioassays-semantification`	v0.1.0	orkg/orkgnlp-bioassays-semantification	Semantification service for BioAssays based on clustering.
`cs-ner`	v0.1.0	orkg/orkgnlp-cs-ner-titles orkg/orkgnlp-cs-ner-abstracts	Annotation service for research papers in the Computer Science domain based on named entity recognition.
`tdm-extraction`	v0.1.0	orkg/orkgnlp-tdm-extraction	Annotation service for Task-Dataset-Metric (TDM) extraction of research papers.
`templates-recommendation`	v0.1.0	orkg/orkgnlp-templates-recommendation	Recommendation service for ORKG templates based on Natural Language Inference (NLI).
`agri-ner`	v0.1.0	orkg/orkgnlp-agri-ner	Annotation service for research papers in the Agriculture domain based on named entity recognition.
`research-fields-classification`	v0.1.0	orkg/orkgnlp-research-fields-classification	Classification service for research field identifications in different domains based on multi-class classification.
`deep-research`	v0.1.0	–	Deep research service in which conducts deep research using ORKG and OpenAI LLM.

(*) Please refer to the release notes or README.md file in the release assets for more information about the version.

To get started with any ORKG NLP service, you can use orkgnlp.load() and pass the service name from the table above.

import orkgnlp
service = orkgnlp.load('predicates-clustering') # This will also download the required model files.
predicates = service(title='paper title', abstract='long abstract text here')

service = orkgnlp.load('tdm-extraction') # This will also download the required model files.
tdms = service(text='DocTAET represented text here', top_n=10)

Predicates Clustering

Overview

The predicates clustering service implements a recommendation service on the top, based on K-means as a clustering algorithm. The grouped data points in our clusters are research papers represented by their titles and abstracts. Data points are semantically grouped based on their research domain contribution and, thus, semantically related predicates are to be recommended to a specific given research paper. This is beneficial in terms of expediting structuring a new paper in the ORKG, and of converging towards the usage of shared vocabulary across users.

Usage

from orkgnlp.clustering import PredicatesRecommender

predicates_recommender = PredicatesRecommender() # This will also download the required model files.
predicates = predicates_recommender(title='paper title', abstract='long abstract text here')
print(predicates)

and the output has the following schema:

[
    {
        "id": "some_id",
        "label": "some_label"
    }
    ...
]

BioAssays Semantification

Overview

The bioassay semantification service automatically semantifies bioassay descriptions based on the semantic model of the Bioassay ontology. More information on the supporting clustering algorithm of the service implementation, its development gold-standard dataset, and its performance results can be found in our publication.

Usage

from orkgnlp.clustering import BioassaysSemantifier

bioassays_semantifier = BioassaysSemantifier() # This will also download the required model files.
labels = bioassays_semantifier(text='BioAssay text description here')
print(labels)

and the output has the following schema:

[
    {
        "property": {
            "id": "some_id",
            "label": "some_label"
        },
        "resources": [
            {
                "id": "some_id",
                "label": "some_label"
            }
            ...
        ]
    }
    ...
]

CS-NER: Computer Science Named Entity Recognition

Overview

The ORKG CS-NER system is based on a standardized set of seven contribution-centric scholarly entities viz., research problem, solution, resource, language, tool, method, and dataset. It can automatically extract all seven entity types from Computer Science publication titles. Furthermore, it can extract research problem and method entity types from Computer Science publication abstracts.

Supported Concepts

Text	Concepts
Title	`RESEARCH_PROBLEM`, `SOLUTION`, `RESOURCE`, `LANGUAGE`, `TOOL`, `METHOD`, `DATASET`.
Abstract	`RESEARCH_PROBLEM`, `METHOD`.

Usage

from orkgnlp.annotation import CSNer

annotator = CSNer() # This will also download the required model files.
annotations = annotator(title='Your paper title here', abstract='Your paper abstract here')
print(annotations)

and the output has the following schema:

{
    "title": [
        {
            "concept": "some_concept",
            "entities": ["annotated entity", "another annotated entity", ... ]
        }
        ....
    ],
    "abstract": [
        {
            "concept": "some_concept",
            "entities": ["annotated entity", "another annotated entity", ... ]
        }
        ....
    ]
}

If you don’t need to extract the annotations for both the abstract and the title, you can also extract them separately. E.g:

from orkgnlp.annotation import CSNer

annotator = CSNer() # This will also download the required model files.
annotations = annotator(title='Your paper title here')
# or
annotations = annotator(abstract='Your paper abstract here')
print(annotations)

and then each output has the following schema:

[
    {
        "concept": "some_concept",
        "entities": ["annotated entity", "another annotated entity", ... ]
    }
    ....
]

TDM-Extraction (Task-Dataset-Metric)

Overview

Based on our publication this service has been developed as a Leaderboard mining system from research publications. The service extracts TDM (Task-Dataset-Metric) entities out of a text represented in DocTAET (Title, Abstract, ExperimentalSetup and TableInformation) representation.

We provide a DocTAET parser from PDF files in this repository and you can also find our gold TDM labels on huggingface.

Usage

from orkgnlp.annotation import TdmExtractor

tdm_extractor = TdmExtractor() # This will also download the required model files.
tdms = tdm_extractor(text='DocTAET represented text here', top_n=10)
print(tdms)

and the output has the following schema:

[
    {
        "task": "some_task",
        "dataset": "some_dataset",
        "metric": "some_metric",
        "score": 0.991233
    }
    ...
]

Templates Recommendation

Overview

This service aims to foster constructing the ORKG using predefined set of predicates that are represented by semantic building blocks called Templates. This directs ORKG users to converge towards selecting predicates added by domain experts while not preventing them from adding new ones / selecting other ones, as the crowdsourcing concept of the ORKG suggests. The recommender is based on fine-tuning the SciBERT pre-trained model with a linear layer on the top to solve the task as a Natural Language Inference (NLI) problem. Note that this service and the Predicates Clustering serve the same purpose, but from different perspectives. You can find our gold templates on huggingface.

Usage

from orkgnlp.nli import TemplatesRecommender

templates_recommender = TemplatesRecommender() # This will also download the required model files.
templates = templates_recommender(title='paper title', abstract='long abstract text here', top_n=10)
print(templates)

and the output has the following schema:

[
    {
        "id": "some_id",
        "label": "some_label",
        "score": 0.991233
    }
    ...
]

Agri-NER: Agriculture Named Entity Recognition

Overview

The ORKG Agri-NER system is based on a standardized set of seven contribution-centric scholarly entities viz., research problem, solution, resource, language, tool, method, and dataset. It can automatically extract all seven entity types from Agriculture publication titles.

Supported Concepts

Text	Concepts
Title	`RESEARCH_PROBLEM`, `PROCESS`, `METHOD`, `RESOURCE`, `SOLUTION`, `LOCATION`, `TECHNOLOGY`.

Usage

from orkgnlp.annotation import AgriNer

annotator = AgriNer() # This will also download the required model files.
annotations = annotator(title='Your paper title here')
print(annotations)

and the output has the following schema:

[
    {
        "concept": "some_concept",
        "entities": ["annotated entity", "another annotated entity", ... ]
    }
    ....
]

Research Fields Classification

Overview

This research field classification service aims to predict the corresponding research fields for given papers. It is designed to assist contributors who may not be familiar with the extensive research field taxonomy present in the ORKG, enabling them to save significant amounts of time. By analysing the title and abstract of a paper, the service suggests potential research fields that align with the content. This empowers authors to effortlessly select an appropriate research field without requiring in-depth knowledge of the research field taxonomy.

Usage

from orkgnlp.annotation import ResearchFieldClassifier

rf_classifier = ResearchFieldClassifier() # This will also download the required model files.
rfs = rf_classifier(raw_input='Your paper combined title with abstract here', top_n=10)
print(rfs)

and the output has the following schema:

[
    {
        "research_field": "some_research_field",
        "score": 0.991233
    }
    ...
]

Deep Research

Overview

Based on our publication, DeepResearch is an agentic AI-based workflow for automated scientific question answering and literature synthesis. It enables recursive, depth- and breadth-controlled literature exploration and synthesis using large language models and configurable search backends

ORKG-NLP Deep Research provides a structured API to:

Explore scientific literature using recursive search strategies.
Synthesize evidence into coherent outputs.
Control exploration through user-defined parameters (depth, breadth).
Produce either direct answers or full research reports.

Usage

from orkgnlp.deepresearch import DeepResearch

deep_reesearcher = DeepResearch(openai_api_key="YOUR_API_KEY") # This will also download the required model files.
research = deep_reesearcher(query="What is the effect of management extensification on grasslands plant diversity?",
                            depth=1,
                            breadth=2,
                            report_type='answer')

print(research)

This returns a concise answer synthesis from literature guided by the specified parameters.

Parameters

Query Arguments are:

Parameter     | Type  | Description                                                    |
————- | —– | ————————————————————– |
query       | str | Scientific question to be answered                             |
depth       | int | Depth of recursive synthesis                                   |
breadth     | int | Breadth of search splits per step                              |
report_type | str | “answer” for short output, “report” for detailed synthesis |

Initialization Arguments:

DeepResearch(
    openai_api_key: str = None,
    firecrawl_api_key: str = None,
    research_provider: str = "orkg", # Provider of scholarly search
    openai_endpoint: str = "https://api.openai.com/v1",
    custom_model: str = "o3-mini",
)