ORKG-NLP Services
Supported Services
ORKG Service |
Version* |
Huggingface Repository |
Description |
---|---|---|---|
|
Recommendation service for ORKG predicates based on clustering. |
||
|
Semantification service for BioAssays based on clustering. |
||
|
Annotation service for research papers in the Computer Science domain based on named entity recognition. |
||
|
v0.1.0 |
Annotation service for Task-Dataset-Metric (TDM) extraction of research papers. |
|
|
Recommendation service for ORKG templates based on Natural Language Inference (NLI). |
||
|
Annotation service for research papers in the Agriculture domain based on named entity recognition. |
||
|
v0.1.0 |
Classification service for research field identifications in different domains based on multi-class classification. |
(*) Please refer to the release notes or README.md file in the release assets for more information about the version.
To get started with any ORKG NLP service, you can use orkgnlp.load()
and pass the service name from the table above.
import orkgnlp
service = orkgnlp.load('predicates-clustering') # This will also download the required model files.
predicates = service(title='paper title', abstract='long abstract text here')
service = orkgnlp.load('tdm-extraction') # This will also download the required model files.
tdms = service(text='DocTAET represented text here', top_n=10)
Read more about each service below!
Predicates Clustering
Overview
The predicates clustering service implements a recommendation service on the top, based on K-means as a clustering algorithm. The grouped data points in our clusters are research papers represented by their titles and abstracts. Data points are semantically grouped based on their research domain contribution and, thus, semantically related predicates are to be recommended to a specific given research paper. This is beneficial in terms of expediting structuring a new paper in the ORKG, and of converging towards the usage of shared vocabulary across users.
Usage
from orkgnlp.clustering import PredicatesRecommender
predicates_recommender = PredicatesRecommender() # This will also download the required model files.
predicates = predicates_recommender(title='paper title', abstract='long abstract text here')
print(predicates)
and the output has the following schema:
[
{
"id": "some_id",
"label": "some_label"
}
...
]
BioAssays Semantification
Overview
The bioassay semantification service automatically semantifies bioassay descriptions based on the semantic model of the Bioassay ontology. More information on the supporting clustering algorithm of the service implementation, its development gold-standard dataset, and its performance results can be found in our publication.
Usage
from orkgnlp.clustering import BioassaysSemantifier
bioassays_semantifier = BioassaysSemantifier() # This will also download the required model files.
labels = bioassays_semantifier(text='BioAssay text description here')
print(labels)
and the output has the following schema:
[
{
"property": {
"id": "some_id",
"label": "some_label"
},
"resources": [
{
"id": "some_id",
"label": "some_label"
}
...
]
}
...
]
CS-NER: Computer Science Named Entity Recognition
Overview
The ORKG CS-NER system is based on a standardized set of seven contribution-centric scholarly entities viz., research problem, solution, resource, language, tool, method, and dataset. It can automatically extract all seven entity types from Computer Science publication titles. Furthermore, it can extract research problem and method entity types from Computer Science publication abstracts.
Supported Concepts
Text |
Concepts |
---|---|
Title |
|
Abstract |
|
Usage
from orkgnlp.annotation import CSNer
annotator = CSNer() # This will also download the required model files.
annotations = annotator(title='Your paper title here', abstract='Your paper abstract here')
print(annotations)
and the output has the following schema:
{
"title": [
{
"concept": "some_concept",
"entities": ["annotated entity", "another annotated entity", ... ]
}
....
],
"abstract": [
{
"concept": "some_concept",
"entities": ["annotated entity", "another annotated entity", ... ]
}
....
]
}
If you don’t need to extract the annotations for both the abstract and the title, you can also extract them separately. E.g:
from orkgnlp.annotation import CSNer
annotator = CSNer() # This will also download the required model files.
annotations = annotator(title='Your paper title here')
# or
annotations = annotator(abstract='Your paper abstract here')
print(annotations)
and then each output has the following schema:
[
{
"concept": "some_concept",
"entities": ["annotated entity", "another annotated entity", ... ]
}
....
]
TDM-Extraction (Task-Dataset-Metric)
Overview
Based on our publication this service has been developed as a Leaderboard mining system from research publications. The service extracts TDM (Task-Dataset-Metric) entities out of a text represented in DocTAET (Title, Abstract, ExperimentalSetup and TableInformation) representation.
We provide a DocTAET parser from PDF files in this repository and you can also find our gold TDM labels on huggingface.
Usage
from orkgnlp.annotation import TdmExtractor
tdm_extractor = TdmExtractor() # This will also download the required model files.
tdms = tdm_extractor(text='DocTAET represented text here', top_n=10)
print(tdms)
and the output has the following schema:
[
{
"task": "some_task",
"dataset": "some_dataset",
"metric": "some_metric",
"score": 0.991233
}
...
]
Templates Recommendation
Overview
This service aims to foster constructing the ORKG using predefined set of predicates that
are represented by semantic building blocks called Templates. This directs ORKG
users to converge towards selecting predicates added by domain experts while not preventing
them from adding new ones / selecting other ones, as the crowdsourcing concept of the
ORKG suggests. The recommender is based on fine-tuning the SciBERT pre-trained
model with a linear layer on the top to solve the task as a Natural Language Inference (NLI) problem.
Note that this service and the Predicates Clustering
serve the same purpose, but
from different perspectives. You can find our
gold templates on huggingface.
Usage
from orkgnlp.nli import TemplatesRecommender
templates_recommender = TemplatesRecommender() # This will also download the required model files.
templates = templates_recommender(title='paper title', abstract='long abstract text here', top_n=10)
print(templates)
and the output has the following schema:
[
{
"id": "some_id",
"label": "some_label",
"score": 0.991233
}
...
]
Agri-NER: Agriculture Named Entity Recognition
Overview
The ORKG Agri-NER system is based on a standardized set of seven contribution-centric scholarly entities viz., research problem, solution, resource, language, tool, method, and dataset. It can automatically extract all seven entity types from Agriculture publication titles.
Supported Concepts
Text |
Concepts |
---|---|
Title |
|
Usage
from orkgnlp.annotation import AgriNer
annotator = AgriNer() # This will also download the required model files.
annotations = annotator(title='Your paper title here')
print(annotations)
and the output has the following schema:
[
{
"concept": "some_concept",
"entities": ["annotated entity", "another annotated entity", ... ]
}
....
]
Research Fields Classification
Overview
This research field classification service aims to predict the corresponding research fields for given papers. It is designed to assist contributors who may not be familiar with the extensive research field taxonomy present in the ORKG, enabling them to save significant amounts of time. By analysing the title and abstract of a paper, the service suggests potential research fields that align with the content. This empowers authors to effortlessly select an appropriate research field without requiring in-depth knowledge of the research field taxonomy.
Usage
from orkgnlp.annotation import ResearchFieldClassifier
rf_classifier = ResearchFieldClassifier() # This will also download the required model files.
rfs = rf_classifier(raw_input='Your paper combined title with abstract here', top_n=10)
print(rfs)
and the output has the following schema:
[
{
"research_field": "some_research_field",
"score": 0.991233
}
...
]