ORKG-NLP Services

Supported Services

ORKG Service

Version*

Huggingface Repository

Description

predicates-clustering

v0.2.0

orkg/orkgnlp-predicates-clustering

Recommendation service for ORKG predicates based on clustering.

bioassays-semantification

v0.1.0

orkg/orkgnlp-bioassays-semantification

Semantification service for BioAssays based on clustering.

cs-ner

v0.1.0

Annotation service for research papers in the Computer Science domain based on named entity recognition.

tdm-extraction

v0.1.0

orkg/orkgnlp-tdm-extraction

Annotation service for Task-Dataset-Metric (TDM) extraction of research papers.

templates-recommendation

v0.1.0

orkg/orkgnlp-templates-recommendation

Recommendation service for ORKG templates based on Natural Language Inference (NLI).

agri-ner

v0.1.0

orkg/orkgnlp-agri-ner

Annotation service for research papers in the Agriculture domain based on named entity recognition.

research-fields-classification

v0.1.0

orkg/orkgnlp-research-fields-classification

Classification service for research field identifications in different domains based on multi-class classification.

(*) Please refer to the release notes or README.md file in the release assets for more information about the version.

To get started with any ORKG NLP service, you can use orkgnlp.load() and pass the service name from the table above.

import orkgnlp
service = orkgnlp.load('predicates-clustering') # This will also download the required model files.
predicates = service(title='paper title', abstract='long abstract text here')

service = orkgnlp.load('tdm-extraction') # This will also download the required model files.
tdms = service(text='DocTAET represented text here', top_n=10)

Read more about each service below!

Predicates Clustering

Overview

The predicates clustering service implements a recommendation service on the top, based on K-means as a clustering algorithm. The grouped data points in our clusters are research papers represented by their titles and abstracts. Data points are semantically grouped based on their research domain contribution and, thus, semantically related predicates are to be recommended to a specific given research paper. This is beneficial in terms of expediting structuring a new paper in the ORKG, and of converging towards the usage of shared vocabulary across users.

Usage

from orkgnlp.clustering import PredicatesRecommender

predicates_recommender = PredicatesRecommender() # This will also download the required model files.
predicates = predicates_recommender(title='paper title', abstract='long abstract text here')
print(predicates)

and the output has the following schema:

[
    {
        "id": "some_id",
        "label": "some_label"
    }
    ...
]

BioAssays Semantification

Overview

The bioassay semantification service automatically semantifies bioassay descriptions based on the semantic model of the Bioassay ontology. More information on the supporting clustering algorithm of the service implementation, its development gold-standard dataset, and its performance results can be found in our publication.

Usage

from orkgnlp.clustering import BioassaysSemantifier

bioassays_semantifier = BioassaysSemantifier() # This will also download the required model files.
labels = bioassays_semantifier(text='BioAssay text description here')
print(labels)

and the output has the following schema:

[
    {
        "property": {
            "id": "some_id",
            "label": "some_label"
        },
        "resources": [
            {
                "id": "some_id",
                "label": "some_label"
            }
            ...
        ]
    }
    ...
]

CS-NER: Computer Science Named Entity Recognition

Overview

The ORKG CS-NER system is based on a standardized set of seven contribution-centric scholarly entities viz., research problem, solution, resource, language, tool, method, and dataset. It can automatically extract all seven entity types from Computer Science publication titles. Furthermore, it can extract research problem and method entity types from Computer Science publication abstracts.

Supported Concepts

Text

Concepts

Title

RESEARCH_PROBLEM, SOLUTION, RESOURCE, LANGUAGE, TOOL, METHOD, DATASET.

Abstract

RESEARCH_PROBLEM, METHOD.

Usage

from orkgnlp.annotation import CSNer

annotator = CSNer() # This will also download the required model files.
annotations = annotator(title='Your paper title here', abstract='Your paper abstract here')
print(annotations)

and the output has the following schema:

{
    "title": [
        {
            "concept": "some_concept",
            "entities": ["annotated entity", "another annotated entity", ... ]
        }
        ....
    ],
    "abstract": [
        {
            "concept": "some_concept",
            "entities": ["annotated entity", "another annotated entity", ... ]
        }
        ....
    ]
}

If you don’t need to extract the annotations for both the abstract and the title, you can also extract them separately. E.g:

from orkgnlp.annotation import CSNer

annotator = CSNer() # This will also download the required model files.
annotations = annotator(title='Your paper title here')
# or
annotations = annotator(abstract='Your paper abstract here')
print(annotations)

and then each output has the following schema:

[
    {
        "concept": "some_concept",
        "entities": ["annotated entity", "another annotated entity", ... ]
    }
    ....
]

TDM-Extraction (Task-Dataset-Metric)

Overview

Based on our publication this service has been developed as a Leaderboard mining system from research publications. The service extracts TDM (Task-Dataset-Metric) entities out of a text represented in DocTAET (Title, Abstract, ExperimentalSetup and TableInformation) representation.

We provide a DocTAET parser from PDF files in this repository and you can also find our gold TDM labels on huggingface.

Usage

from orkgnlp.annotation import TdmExtractor

tdm_extractor = TdmExtractor() # This will also download the required model files.
tdms = tdm_extractor(text='DocTAET represented text here', top_n=10)
print(tdms)

and the output has the following schema:

[
    {
        "task": "some_task",
        "dataset": "some_dataset",
        "metric": "some_metric",
        "score": 0.991233
    }
    ...
]

Templates Recommendation

Overview

This service aims to foster constructing the ORKG using predefined set of predicates that are represented by semantic building blocks called Templates. This directs ORKG users to converge towards selecting predicates added by domain experts while not preventing them from adding new ones / selecting other ones, as the crowdsourcing concept of the ORKG suggests. The recommender is based on fine-tuning the SciBERT pre-trained model with a linear layer on the top to solve the task as a Natural Language Inference (NLI) problem. Note that this service and the Predicates Clustering serve the same purpose, but from different perspectives. You can find our gold templates on huggingface.

Usage

from orkgnlp.nli import TemplatesRecommender

templates_recommender = TemplatesRecommender() # This will also download the required model files.
templates = templates_recommender(title='paper title', abstract='long abstract text here', top_n=10)
print(templates)

and the output has the following schema:

[
    {
        "id": "some_id",
        "label": "some_label",
        "score": 0.991233
    }
    ...
]

Agri-NER: Agriculture Named Entity Recognition

Overview

The ORKG Agri-NER system is based on a standardized set of seven contribution-centric scholarly entities viz., research problem, solution, resource, language, tool, method, and dataset. It can automatically extract all seven entity types from Agriculture publication titles.

Supported Concepts

Text

Concepts

Title

RESEARCH_PROBLEM, PROCESS, METHOD, RESOURCE, SOLUTION, LOCATION, TECHNOLOGY.

Usage

from orkgnlp.annotation import AgriNer

annotator = AgriNer() # This will also download the required model files.
annotations = annotator(title='Your paper title here')
print(annotations)

and the output has the following schema:

[
    {
        "concept": "some_concept",
        "entities": ["annotated entity", "another annotated entity", ... ]
    }
    ....
]

Research Fields Classification

Overview

This research field classification service aims to predict the corresponding research fields for given papers. It is designed to assist contributors who may not be familiar with the extensive research field taxonomy present in the ORKG, enabling them to save significant amounts of time. By analysing the title and abstract of a paper, the service suggests potential research fields that align with the content. This empowers authors to effortlessly select an appropriate research field without requiring in-depth knowledge of the research field taxonomy.

Usage

from orkgnlp.annotation import ResearchFieldClassifier

rf_classifier = ResearchFieldClassifier() # This will also download the required model files.
rfs = rf_classifier(raw_input='Your paper combined title with abstract here', top_n=10)
print(rfs)

and the output has the following schema:

[
    {
        "research_field": "some_research_field",
        "score": 0.991233
    }
    ...
]