The Web Conference 2024
Mohit Chandra and Yiqiao (Ahren) Jin
The Web Conference 2024

Georgia Tech researchers say non-English speakers shouldn’t rely on chatbots like ChatGPT to provide valuable healthcare advice. 

A team of researchers from the College of Computing at Georgia Tech has developed a framework for assessing the capabilities of large language models (LLMs).

Ph.D. students Mohit Chandra and Yiqiao (Ahren) Jin are the co-lead authors of the paper Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries. 

Their paper’s findings reveal a gap between LLMs and their ability to answer health-related questions. Chandra and Jin point out the limitations of LLMs for users and developers but also highlight their potential. 

Their XLingEval framework cautions non-English speakers from using chatbots as alternatives to doctors for advice. However, models can improve by deepening the data pool with multilingual source material such as their proposed XLingHealth benchmark.     

“For users, our research supports what ChatGPT’s website already states: chatbots make a lot of mistakes, so we should not rely on them for critical decision-making or for information that requires high accuracy,” Jin said.   

“Since we observed this language disparity in their performance, LLM developers should focus on improving accuracy, correctness, consistency, and reliability in other languages,” Jin said. 

Using XLingEval, the researchers found chatbots are less accurate in Spanish, Chinese, and Hindi compared to English. By focusing on correctness, consistency, and verifiability, they discovered: 

  • Correctness decreased by 18% when the same questions were asked in Spanish, Chinese, and Hindi. 
  • Answers in non-English were 29% less consistent than their English counterparts. 
  • Non-English responses were 13% overall less verifiable. 

XLingHealth contains question-answer pairs that chatbots can reference, which the group hopes will spark improvement within LLMs.  

The HealthQA dataset uses specialized healthcare articles from the popular healthcare website Patient. It includes 1,134 health-related question-answer pairs as excerpts from original articles.  

LiveQA is a second dataset containing 246 question-answer pairs constructed from frequently asked questions (FAQs) platforms associated with the U.S. National Institutes of Health (NIH).  

For drug-related questions, the group built a MedicationQA component. This dataset contains 690 questions extracted from anonymous consumer queries submitted to MedlinePlus. The answers are sourced from medical references, such as MedlinePlus and DailyMed.   

In their tests, the researchers asked over 2,000 medical-related questions to ChatGPT-3.5 and MedAlpaca. MedAlpaca is a healthcare question-answer chatbot trained in medical literature. Yet, more than 67% of its responses to non-English questions were irrelevant or contradictory.  

“We see far worse performance in the case of MedAlpaca than ChatGPT,” Chandra said. 

“The majority of the data for MedAlpaca is in English, so it struggled to answer queries in non-English languages. GPT also struggled, but it performed much better than MedAlpaca because it had some sort of training data in other languages.” 

Ph.D. student Gaurav Verma and postdoctoral researcher Yibo Hu co-authored the paper. 

Jin and Verma study under Srijan Kumar, an assistant professor in the School of Computational Science and Engineering, and Hu is a postdoc in Kumar’s lab. Chandra is advised by Munmun De Choudhury, an associate professor in the School of Interactive Computing. 
 
The team will present their paper at The Web Conference, occurring May 13-17 in Singapore. The annual conference focuses on the future direction of the internet. The group’s presentation is a complimentary match, considering the conference's location.  

English and Chinese are the most common languages in Singapore. The group tested Spanish, Chinese, and Hindi because they are the world’s most spoken languages after English. Personal curiosity and background played a part in inspiring the study. 

“ChatGPT was very popular when it launched in 2022, especially for us computer science students who are always exploring new technology,” said Jin. “Non-native English speakers, like Mohit and I, noticed early on that chatbots underperformed in our native languages.” 

School of Interactive Computing communications officer Nathan Deen and School of Computational Science and Engineering communications officer Bryant Wine contributed to this report.

News Contact

Bryant Wine, Communications Officer
bryant.wine@cc.gatech.edu

Nathan Deen, Communications Officer
ndeen6@cc.gatech.edu

CHI 2024 Farsight

Thanks to a Georgia Tech researcher's new tool, application developers can now see potential harmful attributes in their prototypes.

Farsight is a tool designed for developers who use large language models (LLMs) to create applications powered by artificial intelligence (AI). Farsight alerts prototypers when they write LLM prompts that could be harmful and misused.

Downstream users can expect to benefit from better quality and safer products made with Farsight’s assistance. The tool’s lasting impact, though, is that it fosters responsible AI awareness by coaching developers on the proper use of LLMs.

Machine Learning Ph.D. candidate Zijie (Jay) Wang is Farsight’s lead architect. He will present the paper at the upcoming Conference on Human Factors in Computing Systems (CHI 2024). Farsight ranked in the top 5% of papers accepted to CHI 2024, earning it an honorable mention for the conference’s best paper award.

“LLMs have empowered millions of people with diverse backgrounds, including writers, doctors, and educators, to build and prototype powerful AI apps through prompting. However, many of these AI prototypers don’t have training in computer science, let alone responsible AI practices,” said Wang.

“With a growing number of AI incidents related to LLMs, it is critical to make developers aware of the potential harms associated with their AI applications.”

Wang referenced an example when two lawyers used ChatGPT to write a legal brief. A U.S. judge sanctioned the lawyers because their submitted brief contained six fictitious case citations that the LLM fabricated.

With Farsight, the group aims to improve developers’ awareness of responsible AI use. It achieves this by highlighting potential use cases, affected stakeholders, and possible harm associated with an application in the early prototyping stage. 

A user study involving 42 prototypers showed that developers could better identify potential harms associated with their prompts after using Farsight. The users also found the tool more helpful and usable than existing resources. 

Feedback from the study showed Farsight encouraged developers to focus on end-users and think beyond immediate harmful outcomes.

“While resources, like workshops and online videos, exist to help AI prototypers, they are often seen as tedious, and most people lack the incentive and time to use them,” said Wang.

“Our approach was to consolidate and display responsible AI resources in the same space where AI prototypers write prompts. In addition, we leverage AI to highlight relevant real-life incidents and guide users to potential harms based on their prompts.”

Farsight employs an in-situ user interface to show developers the potential negative consequences of their applications during prototyping. 

Alert symbols for “neutral,” “caution,” and “warning” notify users when prompts require more attention. When a user clicks the alert symbol, an awareness sidebar expands from one side of the screen. 

The sidebar shows an incident panel with actual news headlines from incidents relevant to the harmful prompt. The sidebar also has a use-case panel that helps developers imagine how different groups of people can use their applications in varying contexts.

Another key feature is the harm envisioner. This functionality takes a user’s prompt as input and assists them in envisioning potential harmful outcomes. The prompt branches into an interactive node tree that lists use cases, stakeholders, and harms, like “societal harm,” “allocative harm,” “interpersonal harm,” and more.

The novel design and insightful findings from the user study resulted in Farsight’s acceptance for presentation at CHI 2024.

CHI is considered the most prestigious conference for human-computer interaction and one of the top-ranked conferences in computer science.

CHI is affiliated with the Association for Computing Machinery. The conference takes place May 11-16 in Honolulu.

Wang worked on Farsight in Summer 2023 while interning at Google + AI Research group (PAIR).

Farsight’s co-authors from Google PAIR include Chinmay KulkarniLauren WilcoxMichael Terry, and Michael Madaio. The group possesses closer ties to Georgia Tech than just through Wang.

Terry, the current co-leader of Google PAIR, earned his Ph.D. in human-computer interaction from Georgia Tech in 2005. Madaio graduated from Tech in 2015 with a M.S. in digital media. Wilcox was a full-time faculty member in the School of Interactive Computing from 2013 to 2021 and serves in an adjunct capacity today.

Though not an author, one of Wang’s influences is his advisor, Polo Chau. Chau is an associate professor in the School of Computational Science and Engineering. His group specializes in data science, human-centered AI, and visualization research for social good.  

“I think what makes Farsight interesting is its unique in-workflow and human-AI collaborative approach,” said Wang. 

“Furthermore, Farsight leverages LLMs to expand prototypers’ creativity and brainstorm a wide range of use cases, stakeholders, and potential harms.”

News Contact

Bryant Wine, Communications Officer
bryant.wine@cc.gatech.edu

headshot of Maryam Alavi

There is an expectation that implementing new and emerging Generative AI (GenAI) tools enhances the effectiveness and competitiveness of organizations. This belief is evidenced by current and planned investments in GenAI tools, especially by firms in knowledge-intensive industries such as finance, healthcare, and entertainment, among others. According to forecasts, enterprise spending on GenAI will increase by two-fold in 2024 and grow to $151.1 billion by 2027.

However, the path to realizing return on these investments remains somewhat ambiguous. While there is a history of efficiency and productivity gains from using computers to automate large-scale routine and structured tasks across various industries, knowledge and professional jobs have largely resisted automation. This stems from the nature of knowledge work, which often involves tasks that are unstructured and ill-defined. The specific input information, desired outputs, and/or the processes of converting inputs to outputs in such tasks are not known a priority, which consequently has limited computer applications in core knowledge tasks.

GenAI tools are changing the business landscape by expanding the range of tasks that can be performed and supported by computers, including idea generation, software development, and creative writing and content production. With their advanced human-like generative abilities, GenAI tools have the potential to significantly enhance the productivity and creativity of knowledge workers. However, the question of how to integrate GenAI into knowledge work to successfully harness these advantages remains a challenge. Dictating the parameters for GenAI usage via a top-down approach, such as through formal job designs or redesigns, is difficult, as it has been observed that individuals tend to adopt new digital tools in ways that are not fully predictable. This unpredictability is especially pertinent to the use of GenAI in supporting knowledge work for the following reasons.

Continue reading: How Different Fields Are Using GenAI to Redefine Roles

Reprinted from the Harvard Business Review, March 25, 2024

Maryam Alavi is the Elizabeth D. & Thomas M. Holder Chair & Professor of IT Management, Scheller College of Business, Georgia Institute of Technology.

News Contact

Lorrie Burroughs

Kai Wang AI2050 Fellowship
Kai Wang ARMMAN visit

Schmidt Sciences has selected Kai Wang as one of 19 researchers to receive this year’s AI2050 Early Career Fellowship. In doing so, Wang becomes the first AI2050 fellow to represent Georgia Tech.

“I am excited about this fellowship because there are so many people at Georgia Tech using AI to create social impact,” said Wang, an assistant professor in the School of Computational Science and Engineering (CSE).

“I feel so fortunate to be part of this community and to help Georgia Tech bring more impact on society.”

AI2050 has allocated up to $5.5 million to support the cohort. Fellows receive up to $300,000 over two years and will join the Schmidt Sciences network of experts to advance their research in artificial intelligence (AI).

Wang’s AI2050 project centers on leveraging decision-focused AI to address challenges facing health and environmental sustainability. His goal is to strengthen and deploy decision-focused AI in collaboration with stakeholders to solve broad societal problems.

Wang’s method to decision-focused AI integrates machine learning with optimization to train models based on decision quality. These models borrow knowledge from decision-making processes in high-stakes domains to improve overall performance.

Part of Wang’s approach is to work closely with non-profit and non-governmental organizations. This collaboration helps Wang better understand problems at the point-of-need and gain knowledge from domain experts to custom-build AI models.   

“It is very important to me to see my research impacting human lives and society,” Wang said. That reinforces my interest and motivation in using AI for social impact.”

[Related: Wang, New Faculty Bolster School’s Machine Learning Expertise]

This year’s cohort is only the second in the fellowship’s history. Wang joins a class that spans four countries, six disciplines, and seventeen institutions.

AI2050 commits $125 million over five years to identify and support talented individuals seeking solutions to ensure society benefits from AI. Last year’s AI2050 inaugural class of 15 early career fellows received $4 million.

The namesake of AI2050 comes from the central motivating question that fellows answer through their projects:

It’s 2050. AI has turned out to be hugely beneficial to society. What happened? What are the most important problems we solved and the opportunities and possibilities we realized to ensure this outcome?

AI2050 encourages young researchers to pursue bold and ambitious work on difficult challenges and promising opportunities in AI. These projects involve research that is multidisciplinary, risky, and hard to fund through traditional means.

Schmidt Sciences, LLC is a 501(c)3 non-profit organization supported by philanthropists Eric and Wendy Schmidt. Schmidt Sciences aims to accelerate and deepen understanding of the natural world and develop solutions to real-world challenges for public benefit.

Schmidt Sciences identify under-supported or unconventional areas of exploration and discovery with potential for high impact. Focus areas include AI and advanced computing, astrophysics and space, biosciences, climate, and cross-science.

“I am most grateful for the advice from my mentors, colleagues, and collaborators, and of course AI2050 for choosing me for this prestigious fellowship,” Wang said. “The School of CSE has given me so much support, including career advice from junior and senior level faculty.”

News Contact

Bryant Wine, Communications Officer
bryant.wine@cc.gatech.edu

Bo Zho is an assistant professor in Georgia Tech's School of Interactive Computing

Georgia Tech Assistant Professor Bo Zhu worked on a multi-institutional team to develop a new AI benchmark for computer graphics. Photo by Eli Burakian/Dartmouth College.

Computer graphic simulations can represent natural phenomena such as tornados, underwater, vortices, and liquid foams more accurately thanks to an advancement in creating artificial intelligence (AI) neural networks.

Working with a multi-institutional team of researchers, Georgia Tech Assistant Professor Bo Zhu combined computer graphic simulations with machine learning models to create enhanced simulations of known phenomena. The new benchmark could lead to researchers constructing representations of other phenomena that have yet to be simulated.

Zhu co-authored the paper Fluid Simulation on Neural Flow Maps. The Association for Computing Machinery’s Special Interest Group in Computer Graphics and Interactive Technology (SIGGRAPH) gave it a best paper award in December at the SIGGRAPH Asia conference in Sydney, Australia. 

The authors say the advancement could be as significant to computer graphic simulations as the introduction of neural radiance fields (NeRFs) was to computer vision in 2020. Introduced by researchers at the University of California-Berkley, University of California-San Diego, and Google, NeRFs are neural networks that easily convert 2D images into 3D navigable scenes. 

NeRFs have become a benchmark among computer vision researchers. Zhu and his collaborators hope their creation, neural flow maps, can do the same for simulation researchers in computer graphics.

“A natural question to ask is, can AI fundamentally overcome the traditional method’s shortcomings and bring generational leaps to simulation as it has done to natural language processing and computer vision?” Zhu said. “Simulation accuracy has been a significant challenge to computer graphics researchers. No existing work has combined AI with physics to yield high-end simulation results that outperform traditional schemes in accuracy.”

In computer graphics, simulation pipelines are the equivalent of neural networks and allow simulations to take shape. They are traditionally constructed through mathematical equations and numerical schemes. 

Zhu said researchers have tried to design simulation pipelines with neural representations to construct more robust simulations. However, efforts to achieve higher physical accuracy have fallen short. 

Zhu attributes the problem to the pipelines’ incapability of matching the capacities of AI algorithms within the structures of traditional simulation pipelines. To solve the problem and allow machine learning to have influence, Zhu and his collaborators proposed a new framework that redesigns the simulation pipeline.

They named these new pipelines neural flow maps. The maps use machine learning models to store spatiotemporal data more efficiently. The researchers then align these models with their mathematical framework to achieve a higher accuracy than previous pipeline simulations.

Zhu said he does not believe machine learning should be used to replace traditional numerical equations. Rather, they should complement them to unlock new advantageous paradigms. 

“Instead of trying to deploy modern AI techniques to replace components inside traditional pipelines, we co-designed the simulation algorithm and machine learning technique in tandem,” Zhu said. 

“Numerical methods are not optimal because of their limited computational capacity. Recent AI-driven capacities have uplifted many of these limitations. Our task is redesigning existing simulation pipelines to take full advantage of these new AI capacities.” 

In the paper, the authors state the once unattainable algorithmic designs could unlock new research possibilities in computer graphics. 

Neural flow maps offer “a new perspective on the incorporation of machine learning in numerical simulation research for computer graphics and computational sciences alike,” the paper states.

“The success of Neural Flow Maps is inspiring for how physics and machine learning are best combined,” Zhu added.

News Contact

Nathan Deen, Communications Officer

Georgia Tech School of Interactive Computing

nathan.deen@cc.gatech.edu

Anna (Anya) Ivanova
The Intersection of AI and Cognitive Neuroscience
Anna (Anya) Ivanova

One of the hallmarks of humanity is language, but now, powerful new artificial intelligence tools also compose poetry, write songs, and have extensive conversations with human users. Tools like ChatGPT and Gemini are widely available at the tap of a button — but just how smart are these AIs? 

A new multidisciplinary research effort co-led by Anna (Anya) Ivanova, assistant professor in the School of Psychology at Georgia Tech, alongside Kyle Mahowald, an assistant professor in the Department of Linguistics at the University of Texas at Austin, is working to uncover just that.

Their results could lead to innovative AIs that are more similar to the human brain than ever before — and also help neuroscientists and psychologists who are unearthing the secrets of our own minds. 

The study, “Dissociating Language and Thought in Large Language Models,” is published this week in the scientific journal Trends in Cognitive Sciences. The work is already making waves in the scientific community: an earlier preprint of the paper, released in January 2023, has already been cited more than 150 times by fellow researchers. The research team has continued to refine the research for this final journal publication. 

“ChatGPT became available while we were finalizing the preprint,” Ivanova explains. “Over the past year, we've had an opportunity to update our arguments in light of this newer generation of models, now including ChatGPT.”

Form versus function

The study focuses on large language models (LLMs), which include AIs like ChatGPT. LLMs are text prediction models, and create writing by predicting which word comes next in a sentence — just like how a cell phone or email service like Gmail might suggest what next word you might want to write. However, while this type of language learning is extremely effective at creating coherent sentences, that doesn’t necessarily signify intelligence.

Ivanova’s team argues that formal competence — creating a well-structured, grammatically correct sentence — should be differentiated from functional competence — answering the right question, communicating the correct information, or appropriately communicating. They also found that while LLMs trained on text prediction are often very good at formal skills, they still struggle with functional skills.

“We humans have the tendency to conflate language and thought,” Ivanova says. “I think that’s an important thing to keep in mind as we're trying to figure out what these models are capable of, because using that ability to be good at language, to be good at formal competence, leads many people to assume that AIs are also good at thinking — even when that's not the case.

It's a heuristic that we developed when interacting with other humans over thousands of years of evolution, but now in some respects, that heuristic is broken,” Ivanova explains.

The distinction between formal and functional competence is also vital in rigorously testing an AI’s capabilities, Ivanova adds. Evaluations often don’t distinguish formal and functional competence, making it difficult to assess what factors are determining a model’s success or failure. The need to develop distinct tests is one of the team’s more widely accepted findings, and one that some researchers in the field have already begun to implement.

Creating a modular system

While the human tendency to conflate functional and formal competence may have hindered understanding of LLMs in the past, our human brains could also be the key to unlocking more powerful AIs. 

Leveraging the tools of cognitive neuroscience while a postdoctoral associate at Massachusetts Institute of Technology (MIT), Ivanova and her team studied brain activity in neurotypical individuals via fMRI, and used behavioral assessments of individuals with brain damage to test the causal role of brain regions in language and cognition — both conducting new research and drawing on previous studies. The team’s results showed that human brains use different regions for functional and formal competence, further supporting this distinction in AIs. 

“Our research shows that in the brain, there is a language processing module and separate modules for reasoning,” Ivanova says. This modularity could also serve as a blueprint for how to develop future AIs.

“Building on insights from human brains — where the language processing system is sharply distinct from the systems that support our ability to think — we argue that the language-thought distinction is conceptually important for thinking about, evaluating, and improving large language models, especially given recent efforts to imbue these models with human-like intelligence,” says Ivanova’s former advisor and study co-author Evelina Fedorenko, a professor of brain and cognitive sciences at MIT and a member of the McGovern Institute for Brain Research.

Developing AIs in the pattern of the human brain could help create more powerful systems — while also helping them dovetail more naturally with human users. “Generally, differences in a mechanism’s internal structure affect behavior,” Ivanova says. “Building a system that has a broad macroscopic organization similar to that of the human brain could help ensure that it might be more aligned with humans down the road.” 

In the rapidly developing world of AI, these systems are ripe for experimentation. After the team’s preprint was published, OpenAI announced their intention to add plug-ins to their GPT models. 

“That plug-in system is actually very similar to what we suggest,” Ivanova adds. “It takes a modularity approach where the language model can be an interface to another specialized module within a system.” 

While the OpenAI plug-in system will include features like booking flights and ordering food, rather than cognitively inspired features, it demonstrates that “the approach has a lot of potential,” Ivanova says.

The future of AI — and what it can tell us about ourselves

While our own brains might be the key to unlocking better, more powerful AIs, these AIs might also help us better understand ourselves. “When researchers try to study the brain and cognition, it's often useful to have some smaller system where you can actually go in and poke around and see what's going on before you get to the immense complexity,” Ivanova explains.

However, since human language is unique, model or animal systems are more difficult to relate. That's where LLMs come in. 

“There are lots of surprising similarities between how one would approach the study of the brain and the study of an artificial neural network” like a large language model, she adds. “They are both information processing systems that have biological or artificial neurons to perform computations.” 

In many ways, the human brain is still a black box, but openly available AIs offer a unique opportunity to see the synthetic system's inner workings and modify variables, and explore these corresponding systems like never before.

It's a really wonderful model that we have a lot of control over,” Ivanova says. “Neural networks — they are amazing.”

 

Along with Anna (Anya) Ivanova, Kyle Mahowald, and Evelina Fedorenko, the research team also includes Idan Blank (University of California, Los Angeles), as well as Nancy Kanwisher and Joshua Tenenbaum (Massachusetts Institute of Technology).

 

DOI: https://doi.org/10.1016/j.tics.2024.01.011

Researcher Acknowledgements

For helpful conversations, we thank Jacob Andreas, Alex Warstadt, Dan Roberts, Kanishka Misra, students in the 2023 UT Austin Linguistics 393 seminar, the attendees of the Harvard LangCog journal club, the attendees of the UT Austin Department of Linguistics SynSem seminar, Gary Lupyan, John Krakauer, members of the Intel Deep Learning group, Yejin Choi and her group members, Allyson Ettinger, Nathan Schneider and his group members, the UT NLL Group, attendees of the KUIS AI Talk Series at Koç University in Istanbul, Tom McCoy, attendees of the NYU Philosophy of Deep Learning conference and his group members, Sydney Levine, organizers and attendees of the ILFC seminar, and others who have engaged with our ideas. We also thank Aalok Sathe for help with document formatting and references.

Funding sources

Anna (Anya) Ivanova was supported by funds from the Quest Initiative for Intelligence. Kyle Mahowald acknowledges funding from NSF Grant 2104995. Evelina Fedorenko was supported by NIH awards R01-DC016607, R01-DC016950, and U01-NS121471 and by research funds from the Brain and Cognitive Sciences Department, McGovern Institute for Brain Research, and the Simons Foundation through the Simons Center for the Social Brain.

News Contact

Written by Selena Langner

Editor and Press Contact:
Jess Hunt-Ralston
Director of Communications
College of Sciences
Georgia Tech

Nikki MacKenzie

Artificial intelligence is starting to have the capability to improve both financial reporting and auditing. However, both companies and audit firms will only realize the benefits of AI if their people are open to the information generated by the technology. A new study forthcoming in Review of Accounting Studies attempts to understand how financial executives perceive and respond to the use of AI in both financial reporting and auditing.

In “How do Financial Executives Respond to the Use of Artificial Intelligence in Financial Reporting and Auditing?,” researchers surveyed financial executives (e.g., CFOs, controllers) to assess their perceptions of AI use in their companies’ financial reporting process, as well as the use of AI by their financial statement auditor. The study is authored by Nikki MacKenzie of the Georgia Tech Scheller College of Business, Cassandra Estep from Emory University, and Emily Griffith of the University of Wisconsin.

“We were curious about how financial executives would respond to AI-generated information as we often hear how the financial statements are a joint product of the company and their auditors. While we find that financial executives are rightfully cautious about the use of AI, we do not find that they are averse to its use as has been previously reported. In fact, a number of our survey respondents were excited about AI and see the significant benefits for their companies’ financial reporting process,” says MacKenzie.

Continue reading: The Use of AI by Financial Executives and Their Auditors

Reprinted from Forbes

News Contact

Lorrie Burroughs, Scheller College of Business

GT and Waterloo Partnership

The University of Waterloo and the Board of Regents of the University System of Georgia, representing Georgia Institute of Technology (Georgia Tech), have officially entered a Memorandum of Understanding (MOU) to strengthen academic and research ties between the two institutions. The MOU signifies a commitment to fostering collaborative initiatives in research, education, and other areas of mutual interest. Both universities, recognized for their global impact and innovation, are eager to embark on this journey of cooperation.

Charmaine Dean, Vice-President of Research & International, shared, “The University of Waterloo is pleased to embark on a new collaboration with Georgia Tech, featuring faculty and student exchanges, joint research projects, dual degrees, and conferences. Strengthening ties between our institutions through this collaboration creates a dynamic environment for our faculty and students to foster innovation in many areas of mutual excellence.”

“Georgia Tech is excited to see its NSF AI Institute for Advances in Optimization (AI4OPT), under the leadership of Prof. Pascal Van Hentenryck, partner with experts from the Waterloo Artificial Intelligence Institute of the University of Waterloo. I am really looking forward to the impact that this partnership will have in advancing the fundamental knowledge of AI, in further expanding its applications, and in enabling its wider adoption,” noted Prof. Bernard Kippelen, Vice Provost for International Initiatives at Georgia Tech.

This collaboration is poised to elevate the academic and research landscape of both institutions, promoting global engagement and creating opportunities for students and faculty to thrive in an interconnected world.

News Contact

Breon Martin

Micrograph of a mucinous ovarian tumor (Photo National Institutes of Health)

Micrograph of a mucinous ovarian tumor (Photo National Institutes of Health)

For over three decades, a highly accurate early diagnostic test for ovarian cancer has eluded physicians. Now, scientists in the Georgia Tech Integrated Cancer Research Center (ICRC) have combined machine learning with information on blood metabolites to develop a new test able to detect ovarian cancer with 93 percent accuracy among samples from the team’s study group.

John McDonald, professor emeritus in the School of Biological Sciences, founding director of the ICRC, and the study’s corresponding author, explains that the new test’s accuracy is better in detecting ovarian cancer than existing tests for women clinically classified as normal, with a particular improvement in detecting early-stage ovarian disease in that cohort.

The team’s results and methodologies are detailed in a new paper, “A Personalized Probabilistic Approach to Ovarian Cancer Diagnostics,” published in the March 2024 online issue of the medical journal Gynecologic Oncology. Based on their computer models, the researchers have developed what they believe will be a more clinically useful approach to ovarian cancer diagnosis — whereby a patient’s individual metabolic profile can be used to assign a more accurate probability of the presence or absence of the disease.

“This personalized, probabilistic approach to cancer diagnostics is more clinically informative and accurate than traditional binary (yes/no) tests,” McDonald says. “It represents a promising new direction in the early detection of ovarian cancer, and perhaps other cancers as well.”

The study co-authors also include Dongjo Ban, a Bioinformatics Ph.D. student in McDonald’s lab; Research Scientists Stephen N. Housley, Lilya V. Matyunina, and L.DeEtte (Walker) McDonald; Regents’ Professor Jeffrey Skolnick, who also serves as Mary and Maisie Gibson Chair in the School of Biological Sciences and Georgia Research Alliance Eminent Scholar in Computational Systems Biology; and two collaborating physicians: University of North Carolina Professor Victoria L. Bae-Jump and Ovarian Cancer Institute of Atlanta Founder and Chief Executive Officer Benedict B. Benigno. Members of the research team are forming a startup to transfer and commercialize the technology, and plan to seek requisite trials and FDA approval for the test.

Silent killer

Ovarian cancer is often referred to as the silent killer because the disease is typically asymptomatic when it first arises — and is usually not detected until later stages of development, when it is difficult to treat.

McDonald explains that while the average five-year survival rate for late-stage ovarian cancer patients, even after treatment, is around 31 percent — but that if ovarian cancer is detected and treated early, the average five-year survival rate is more than 90 percent.

“Clearly, there is a tremendous need for an accurate early diagnostic test for this insidious disease,” McDonald says.

And although development of an early detection test for ovarian cancer has been vigorously pursued for more than three decades, the development of early, accurate diagnostic tests has proven elusive. Because cancer begins on the molecular level, McDonald explains, there are multiple possible pathways capable of leading to even the same cancer type.

“Because of this high-level molecular heterogeneity among patients, the identification of a single universal diagnostic biomarker of ovarian cancer has not been possible,” McDonald says. “For this reason, we opted to use a branch of artificial intelligence — machine learning — to develop an alternative probabilistic approach to the challenge of ovarian cancer diagnostics.”

Metabolic profiles

Georgia Tech co-author Dongjo Ban, whose thesis research contributed to the study, explains that “because end-point changes on the metabolic level are known to be reflective of underlying changes operating collectively on multiple molecular levels, we chose metabolic profiles as the backbone of our analysis.”

“The set of human metabolites is a collective measure of the health of cells,” adds coauthor Jeffrey Skolnick, “and by not arbitrarily choosing any subset in advance, one lets the artificial intelligence figure out which are the key players for a given individual.”

Mass spectrometry can identify the presence of metabolites in the blood by detecting their mass and charge signatures. However, Ban says, the precise chemical makeup of a metabolite requires much more extensive characterization.

Ban explains that because the precise chemical composition of less than seven percent of the metabolites circulating in human blood have, thus far, been chemically characterized, it is currently impossible to accurately pinpoint the specific molecular processes contributing to an individual's metabolic profile.

However, the research team recognized that, even without knowing the precise chemical make-up of each individual metabolite, the mere presence of different metabolites in the blood of different individuals, as detected by mass spectrometry, can be incorporated as features in the building of accurate machine learning-based predictive models (similar to the use of individual facial features in the building of facial pattern recognition algorithms).

“Thousands of metabolites are known to be circulating in the human bloodstream, and they can be readily and accurately detected by mass spectrometry and combined with machine learning to establish an accurate ovarian cancer diagnostic,” Ban says.

A new probabilistic approach

The researchers developed their integrative approach by combining metabolomic profiles and machine learning-based classifiers to establish a diagnostic test with 93 percent accuracy when tested on 564 women from Georgia, North Carolina, Philadelphia and Western Canada. 431 of the study participants were active ovarian cancer patients, and while the remaining 133 women in the study did not have ovarian cancer.

Further studies have been initiated to study the possibility that the test is able to detect very early-stage disease in women displaying no clinical symptoms, McDonald says.

McDonald anticipates a clinical future where a person with a metabolic profile that falls within a score range that makes cancer highly unlikely would only require yearly monitoring. But someone with a metabolic score that lies in a range where a majority (say, 90%) have previously been diagnosed with ovarian cancer would likely be monitored more frequently — or perhaps immediately referred for advanced screening.

Citation: https://doi.org/10.1016/j.ygyno.2023.12.030

Funding

This research was funded by the Ovarian Cancer Institute (Atlanta), the Laura Crandall Brown Foundation, the Deborah Nash Endowment Fund, Northside Hospital (Atlanta), and the Mark Light Integrated Cancer Research Student Fellowship.

Disclosure

Study co-authors John McDonald, Stephen N. Housley, Jeffrey Skolnick, and Benedict B. Benigno are the co-founders of MyOncoDx, Inc., formed to support further research, technology transfer, and commercialization for the team’s new clinical tool for the diagnosis of ovarian cancer.

News Contact

Writer: Renay San Miguel
Communications Officer II/Science Writer
College of Sciences
404-894-5209

Editor: Jess Hunt-Ralston

 

GTRI Machine Learning Project Leads

GTRI has developed a dashboard that aids in the DoD's development and testing of AI and ML models that would be utilized during real-time decision-making situations. Pictured from L to R are the two project leads, GTRI Research Engineer Austin Ruth and GTRI Senior Research Engineer Jovan Munroe (Photo Credit: Sean McNeil, GTRI).

GTRI MLOps team

The MLOps team poses with GTRI Chief Technology Officer Mark Whorton (far left) and GTRI Director Jim Hudgens (second from left) after winning an IRAD of the Year award for their work on this project at GTRI's FY23 IRAD Extravaganza event (Photo Credit: Sean McNeil, GTRI).

Machine learning (ML) has transformed the digital landscape with its unprecedented ability to automate complex tasks and improve decision-making processes. However, many organizations, including the U.S. Department of Defense (DoD), still rely on time-consuming methods for developing and testing machine learning models, which can create strategic vulnerabilities in today’s fast-changing environment. 

The Georgia Tech Research Institute (GTRI) is addressing this challenge by developing a Machine Learning Operations (MLOps) platform that standardizes the development and testing of artificial intelligence (AI) and ML models to enhance the speed and efficiency with which these models are utilized during real-time decision-making situations.   

“It’s been difficult for organizations to transition these models from a research environment and turn them into fully-functional products that can be used in real-time,” said Austin Ruth, a GTRI research engineer who is leading this project. “Our goal is to bring AI/ML to the tactical edge where it could be used during active threat situations to heighten the survivability of our warfighters.” 

Rather than treating ML development in isolation, GTRI’s MLOps platform would bridge the gap between data scientists and field operations so that organizations can oversee the entire lifecycle of ML projects from development to deployment at the tactical edge. 

The tactical edge refers to the immediate operational space where decisions are made and actions take place. Bringing AI and ML capabilities closer to the point of action would enhance the speed, efficiency and effectiveness of decision-making processes and contribute to more agile and adaptive responses to threats. 

“We want to develop a system where fighter jets or warships don’t have to do any data transfers but could train and label the data right where they are and have the AI/ML models improve in real-time as they’re actively going up against threats,” said Ruth.   

For example, a model could monitor a plane’s altitude and speed, immediately spot potential wing drag issues and alert the pilot about it. In an electronic warfare (EW) situation when facing enemy aircraft or missiles, the models could process vast amounts of incoming data to more quickly identify threats and recommend effective countermeasures in real time. 

AI/ML models need to be trained and tested to ensure their effectiveness in adapting to new, unseen data. However, without having a standardized process in place, training and testing is done in a fragmented manner, which poses several risks, such as overfitting, where the model performs well on the training data but fails to generalize unseen data and makes inaccurate predictions or decisions in real-world situations, security vulnerabilities where bad actors exploit weaknesses in the models, and a general lack of robustness and inefficient resource utilization.

“Throughout this project, we noticed that training and testing are often done in a piecemeal fashion and thus aren’t repeatable,” said Jovan Munroe, a GTRI senior research engineer who is also leading this project. “Our MLOps platform makes the training and testing process more consistent and well-defined so that these models are better equipped to identify and address unknown variables in the battle space.” 

This project has been supported by GTRI’s Independent Research and Development (IRAD) Program, winning an IRAD of the Year award in fiscal year 2023. In fiscal year 2024, the project received funding from a U.S. government sponsor. 

 

Writer: Anna Akins 
Photos: Sean McNeil 
GTRI Communications
Georgia Tech Research Institute
Atlanta, Georgia

The Georgia Tech Research Institute (GTRI) is the nonprofit, applied research division of the Georgia Institute of Technology (Georgia Tech). Founded in 1934 as the Engineering Experiment Station, GTRI has grown to more than 2,900 employees, supporting eight laboratories in over 20 locations around the country and performing more than $940 million of problem-solving research annually for government and industry. GTRI's renowned researchers combine science, engineering, economics, policy, and technical expertise to solve complex problems for the U.S. federal government, state, and industry.

News Contact

(Interim) Director of Communications

Michelle Gowdy

Michelle.Gowdy@gtri.gatech.edu

404-407-8060