FAIR GPT as a Virtual Data Steward for Research Data Management

Data is required in nearly every part of scientific research, and proper documentation is important to make sure that research results are easily accessible, trustworthy, and usable, especially in open-source projects. This article examines the potential of Artificial Intelligence (AI) to change the way data documentation is done, assisting researchers in following the FAIR principles (Findable, Accessible, Interoperable, and Reusable). Specifically, it presents FAIR GPT, an AI tool created to assist researchers in practicing FAIR data management.

What is FAIR Data?

FAIR data is research data that complies with the FAIR principles: Findable, Accessible, Interoperable, and Reusable. These principles apply to both the actual data and the accompanying metadata. Metadata contains essential information about the data.

Findable: Information must be simple to find for both individuals and machines. This requires well-described data with rich metadata, including standardized terms and unique identifiers, and accessibility through search engines and repositories.

Accessible: Data should be easily obtainable by authorized users, ideally via clear and standardized communication protocols. Although FAIR data does not always equate to open data, it is important for the metadata to be accessible even when the data is restricted.

Interoperable: Data should be presented in a manner that facilitates reuse in future research by both humans and machines. Clear documentation, licenses, and metadata about the origin and processing of the data are crucial.

Challenges in Implementing FAIR Data and the Role of AI

While creating and sharing FAIR data offers many advantages, such as improved efficiency, reproducibility, and transparency in research, there are several challenges in its implementation:

Technical Challenges:

Lack of interoperability: A major technical challenge is the lack of interoperability between different data systems. This can make it difficult to integrate and analyze data from various sources.

Cultural Challenges:

Resistance to data sharing: A significant cultural barrier is the resistance to data sharing among researchers. This resistance may stem from concerns over data ownership, competition, or the time and effort involved in making data FAIR.
Lack of awareness: There is often a lack of awareness about the importance of FAIR principles and the benefits of data sharing.

Organizational Challenges:

Limited resources and support: Organizations often face limited resources and support for data management. This can lead to a lack of investment in the necessary infrastructure, training, and personnel.

The Role of AI in Creating FAIR Data

Artificial Intelligence offers promising solutions to overcome these challenges:

Education and Awareness: AI tools can disseminate knowledge about FAIR principles, providing researchers with guidance and best practices in data management.
Technical Support in Data Management: AI can assist in generating and validating metadata, structuring datasets, and confirming compliance with data standards.
Improvement of Data and Metadata: AI algorithms can identify gaps or inconsistencies in metadata, suggest standardized terms, and guarantee the interoperability of data.
Support with Legal and Ethical Issues: AI can aid in meeting legal and ethical requirements related to data sharing.

Introduction to Fair GPT

FAIR GPT is a specialized AI tool integrated within ChatGPT Plus, designed to support researchers in implementing FAIR data principles. The GPT is created by Dr. Renat Shigapov, a Data Science Consultant and Data Scientist at the University of Mannheim. FAIR GPT Acts as a virtual data steward, FAIR GPT provides personalized assistance to make data more findable, accessible, interoperable, and reusable. It connects with external APIs to reduce errors, make recommendations for repositories, assess metadata, and suggest improvements. Thus, AI can help make FAIR data sharing less labor-intensive.

How to use FAIR GPT

To get started with FAIR GPT, follow these steps:

Subscribe to ChatGPT Plus: Purchase a subscription to ChatGPT Plus, as FAIR GPT is available through this paid service.
Access FAIR GPT: After subscribing, you can access FAIR GPT through this link.
Start Using FAIR GPT: You can use FAIR GPT in various ways, including:
- Uploading your metadata or part of your data and requesting assistance.
- Copying and pasting your metadata or data into a prompt and asking for help.
- Providing a link to your data and requesting an evaluation.
- Asking questions about FAIR data principles and practices.

Why to use FAIR GPT

FAIR GPT uses external sources of information, called APIs (Application Programming Interfaces), to give more accurate and reliable guidance. These APIs connect FAIR GPT to specialized databases and tools, enabling it to give adequate advice on how to organize and share data in a FAIR-compliant way.

APIs Integrated with FAIR GPT:

re3data Repositories API: Connects to re3data.org, a directory of trusted repositories where researchers can store their data. This helps FAIR GPT suggest suitable repositories, making data easier to find and access.
FAIR Enough API: Checks if a dataset meets the FAIR principles, helping users understand how “FAIR” their data is and offering advice on improvements.
Wikidata API: Links to Wikidata, a large online database, to connect datasets to standardized terms, facilitating data sharing and integration.
TIB Central Terminology Service Search API: Assists in finding appropriate terms from recognized vocabularies, enhancing data searchability and interoperability.
FAIR-Checker: Evaluates datasets for compliance with FAIR principles using advanced techniques like Knowledge Graphs, providing detailed advice on data management improvements.

Why Do These APIs Help Reduce Errors? Sometimes AI models like ChatGPT can generate incorrect or made-up information, which is known as a “hallucination.” This happens when the AI doesn’t have enough reliable data to answer a question accurately. By using external APIs, FAIR GPT connects to real-time, trusted databases and tools, making sure that the information it provides is up-to-date and verified. This reduces the chances of giving incorrect advice and makes its recommendations more trustworthy.

Prompts for FAIR GPT:

“Can you evaluate my metadata and suggest improvements?” Upload your metadata or paste it into the chat window, and FAIR GPT will analyze it for completeness and compliance with standards.
“Which data repository is the most suitable for my dataset?” Describe your data, and FAIR GPT can recommend suitable repositories using the re3data API.
“Help me create a data management plan.” FAIR GPT can guide you through creating a data management plan, including metadata, access guidelines, and ethical considerations.
“How can I link my dataset to relevant knowledge graphs?” FAIR GPT can advise on how to use Wikidata and other knowledge graphs to improve the interoperability of your data.

In addition to these specific prompts, you can also ask FAIR GPT open-ended questions about your specific dataset or FAIR-related challenges. FAIR GPT is designed to handle a wide range of prompts and requests related to FAIR data.

Conclusion

AI technologies, such as FAIR GPT, can play a role in helping researchers adhere to the FAIR data principles (findable, accessible, interoperable, and reusable). Firstly, AI-driven tools like FAIR GPT can automate different aspects of data management, reducing the workload for researchers. Secondly, AI chatbots like FAIR GPT can act as virtual data stewards, enabling researchers to ask questions and receive guidance on various aspects of FAIR data management.

Resources

Shigapov, R. (2023, Dezember 15). Optimizing FAIR data sharing with ChatGPT. ENGAGE.EU Webinar on FAIR data, Online. Zenodo. https://doi.org/10.5281/zenodo.10378143
Shigapov, R. (2024, February 15). ChatGPT for FAIR Research Data. Research Data Management Seminars at the University of Mannheim, Online. Zenodo. https://doi.org/10.5281/zenodo.10664554