AI for Science (one way to think about it)
Date of Publishing
22nd of January, 2025
Author
astitva

“Science is a system, not a use case” , wise words that revealed to me the complex, interconnected web of people, technologies and institutions that keep the system of Science functioning. The coming infusion of large-scale artificial intelligence in this system needs to be a coordinated, community undertaking. With this newsletter I hope to explore and share some of the biggest hurdles and opportunities to use AI to augment the endeavor of Science.

Science as an engine of “Knowledge production” runs on human cognition. Over hundreds of years and through lots of trials and errors, a whole system of protecting and nurturing that scarce resource has formed which keeps this engine running in the right-ish direction .This includes features like universities, the lab structure, peer-review as well as the complex infrastructure that supports the computing and experimental needs of the research workflow.

One way to think about it, is a pyramid with 3 layers. The base is where existing knowledge is assimilated, dissected and synthesized to produce new conjectures and research directions. These curated insights then move up to the second layer which is where it meets the test of analytical and experimental rigor. This is where complex computations need to be orchestrated on diverse and heterogeneous infrastructure and where AI potentially has the most to contribute, cognitively. The final layer is where science becomes art and fluctuations become particles, i.e. the realm of Discovery.

We are seeing early impacts and bottle-necks for AI in each of these three layers :

  • Knowledge Assimilation – Tools like Elicit, Sci-space and others are already driving value in summarization and concept discovery, but scientific reasoning remains a bottleneck for LLMs for hypothesis generation.
  • Analytical Coordination – The coding skills of optimized LLMs are showing increasing utility, but agency (ability to plan, execute, reflect, repeat) is still a work in progress (Langchain, LlamaIndex are great starts).
  • Discovery – Reasoning features in sufficiently large models are encouraging, but Discovery is a system wide coordination and many gaps exist in the underlying (digital) substrate (Although encouraging public and private efforts are underway).

In the next few articles I hope to uncover some of the emerging trends for AI adoption across each aspect of this ‘AI4Science’ system and will share what I learn (and learn from what you might share 🙂 )

Several documents have helped me in the research for this approach. I summarize some of the references over here. More to follow.

One interesting (and encouraging) must-read on the “System of Science” here.