Harnessing LLMs as Science assistants : A J.A.R.V.I.S for every researcher

Date of Publishing
22nd of January, 2025

While various AI technologies could enhance discrete stages of the scientific research workflow (as described in Edition 1), large language models (LLMs) stand apart in their potential as an integrated research co-pilot across the entire process. As LLMs scale, they exhibit strengthened reasoning, information retrieval, and domain adaptability [Brown, Tom B. et al.][Lee, Jinhyuk et al.], suggesting promise as a versatile research assistant. Recent work already demonstrates LLMs’ capabilities for activities like literature synthesis, data analysis, and text generation. With systematic advancements in training, fine-tuning and domain specific task adaptation, LLMs could become a singular AI agent capable of supporting researchers on their end to end workflows.

LLMs as co-pilots for research : SoTA and Opportunities

To fully harness LLMs as scientific assistants their core cognitive capabilities, such as logical reasoning, knowledge mining and truth grounding must be enhanced. Recent research and improvements in training data curation, finetuning and hybrid RAG architectures have already shown promising results with directional improvements. Below is a quick survey of the current State of the Art (SoTA) and opportunities for improvement across each stage of the research workflow.

1.Knowledge Assimilation and Dissemination

SoTA : LLMs have already made great strides in advanced search and retrieval capabilities. With new architectures of Retrieval Augmented Generation (RAG) and specialized fine-tuning (Great Vertex AI tutorial here) we are getting increasing levels of performance when it comes to automated understanding and summarization of multi-modal scientific literature.

Improvements needed

– Broadening context windows and working memory dimensions for effective knowledge extraction and integration with Knowledge Graphs [Pan, Shirui et al.]

– Continued improvements in scientific reasoning and truth grounding through curated datasets[Taylor, Ross et al.] and novel prompting and refinement approaches. [Singhal, K. et al.]

2. Hypotheses Generation

SoTA : LLMs are a natural fit for pattern recognition across vast, domain-specific scientific literature. In recent months, tools like Elicit (which uses LLM based literature exploration) are already showing remarkable capabilities, not only in search and retrieval, but also novel concepts discovery and finding connections across different disciplines to generate new hypotheses.

Improvements needed

– Improvements in reasoning through enhanced prompting (CoT ,Self Consistency) [Wang, Xuezhi et al.] and improved task adapted benchmarks.

– Utilizing innovative benchmarking tools for assessing scientific and mathematical reasoning capabilities during literature traversal [SVAMP].

– Merging external and proprietary (internal to a research team or lab) datasets for expanded context awareness

3.Analytical Coordination

SoTA : LLMs are excellent code generators, and that ability lends itself perfectly to the orchestration and coordination of analytical tasks. In fact, LLMs as data science and computational assistants are already unlocking a lot of new capabilities for researchers who are NOT computationally fluent. These orchestrator LLMs, when bundled together with computational platforms (a great example being GCPs Duet AI) can turn every researcher into a data scientist.

Improvements needed

– Ability to integrate and work with the increasing diversity of narrow AI models [Wang, Hanchen et al.] and benchmarks [Thiyagalingam, Jeyan et al.] showing promise across different research domains

– Utilization of scientific containerization [Kurtzer, Gregory M. et al.] for orchestrating portable and repeatable research workflows

– Ability to work with and stitch together external and open source datasets and models to allow for research exploration (Eg. Kaggle datasets and Hugging Face model cards )

– Enhancing LLM connections with APIs and Mixture of Expert architectures [Liang, Yaobo et al.] and Integration with experimental apparatus [Huang, Wenlong et al.] for specialized computation or experimentation.

4&5. Inference and Validation & Discovery

SoTA : The final two stages of the research workflow are areas where LLM still need to demonstrate more capabilities. While multi-modal generation and automated citation could be great tools to assist researchers in the publication of their papers, LLMs would need to pass the extreme rigor of scientific publication to be completely trusted as a tool for validation of new knowledge (and rightly so). Hallucinations and wrong answers might be tolerable in other non-rigorous knowledge work, but when it comes to Science, the adherence to scientific facts is non negotiable.

Improvements needed

– Training and evaluation on specialized scientific datasets and notations for advance pattern recognition[Weininger, David.]

– Enhanced truth grounding (Possibly through a scientific computation language underpinning logical reasoning) [Wolfram]

– Improvements in LLMs as Knowledge Bases and Retrieval Augmented Generation [Petroni, Fabio et al][Lewis, Patrick et al.]

– Improving models explainability [Arrieta, Alejandro Barredo et al] to validate inferences and ensure consistency with wider knowledge base

– More work on general and transferable learning across domains and human aligned reasoning and ethics [Bubeck, Sébastien et al.]

– Support for peer-review process and replication

Gradual but steady progress is already being made on some of the improvement opportunities identified above. Some of the best improvements are being seen in areas where there is a deep collaboration between Scientific institutions and Technology providers, and as these efforts chip away at unlocking latent LLM capabilities for scientific assistant, the day might not be far away when every student, researcher, faculty could have access to and the support of their very own JARVIS.

LLMs as co-pilots for research : SoTA and Opportunities

Contact Us