Georgia Tech students and staff members gathered at the Advanced Manufacturing Pilot Facility with members of OPEN MIND for the training.

Georgia Tech students and staff members gathered at the Advanced Manufacturing Pilot Facility with members of OPEN MIND for the training.

As automation and AI continue to transform the manufacturing industry, the need for seamless integration across all production stages has reached an all-time high. By digitally designing products, controlling the machinery that builds them, and collecting precise data at each step, digital integration streamlines the entire manufacturing process — cutting down on waste materials, cost, and production time.  

Recently, the Georgia Tech Manufacturing Institute (GTMI) teamed up with OPEN MIND Technologies to host an immersive, weeklong training session on hyperMILL, an advanced manufacturing software enabling this digital integration.   

OPEN MIND, the developer of hyperMILL, has been a longtime supporter of research operations in Georgia Tech’s Advanced Manufacturing Pilot Facility (AMPF). “Our adoption of their software solutions has allowed us to explore the full potential of machines and to make sure we keep forging new paths,” said Steven Ferguson, a principal research scientist at GTMI. 

Software like hyperMILL helps plan the most efficient and accurate way to cut, shape, or 3D print materials on different machines, making the process faster and easier. Hosted at the AMPF, the immersive training offered 10 staff members and students a hands-on platform to use the software while practicing machining and additive manufacturing techniques. 

“The number of new features and tricks that the software has every year makes it advantageous to stay current and get a refresher course,” said Alan Burl, a Ph.D. student in the George W. Woodruff School of Mechanical Engineering who attended the training session. “More advanced users can learn new tips and tricks while simultaneously exposing new users to the power of a fully featured, computer-aided manufacturing software.” 

OPEN MIND Technologies has partnered with Georgia Tech for over five years to support digital manufacturing research, offering biannual training in their latest software to faculty and students. 

“Meeting the new graduate students each fall is something that I look forward to,” said Brad Rooks, an application engineer at OPEN MIND and one of the co-leaders of the training session. “This particular group posed questions that were intuitive and challenging to me as a trainer — their inquisitive nature drove me to look at our software from fresh perspectives.” 

The company is also a member of GTMI’s Manufacturing 4.0 Consortium, a membership-based group that unites industry, academia, and government to develop and implement advanced manufacturing technologies and train the workforce for the market. 

“The strong reputation of GTMI in the manufacturing industry, and more importantly, the reputation of the students, faculty, and researchers who support research within our facilities, enables us to forge strategic partnerships with companies like OPEN MIND,” says Ferguson, who also serves as executive director of the consortium. “These relationships are what makes working with and within GTMI so special.” 

News Contact

Audra Davidson
Research Communications Program Manager
Georgia Tech Manufacturing Institute

CSE NeurIPS 2024
CSE NeurIPS 2024

Georgia Tech researchers have created a dataset that trains computer models to understand nuances in human speech during financial earnings calls. The dataset provides a new resource to study how public correspondence affects businesses and markets. 

SubjECTive-QA is the first human-curated dataset on question-answer pairs from earnings call transcripts (ECTs). The dataset teaches models to identify subjective features in ECTs, like clarity and cautiousness.   

The dataset lays the foundation for a new approach to identifying disinformation and misinformation caused by nuances in speech. While ECT responses can be technically true, unclear or irrelevant information can misinform stakeholders and affect their decision-making. 

Tests on White House press briefings showed that the dataset applies to other sectors with frequent question-and-answer encounters, notably politics, journalism, and sports. This increases the odds of effectively informing audiences and improving transparency across public spheres.   

The intersecting work between natural language processing and finance earned the paper acceptance to NeurIPS 2024, the 38th Annual Conference on Neural Information Processing Systems. NeurIPS is one of the world’s most prestigious conferences on artificial intelligence (AI) and machine learning (ML) research.

"SubjECTive-QA has the potential to revolutionize nowcasting predictions with enhanced clarity and relevance,” said Agam Shah, the project’s lead researcher. 

“Its nuanced analysis of qualities in executive responses, like optimism and cautiousness, deepens our understanding of economic forecasts and financial transparency."

[MICROSITE: Georgia Tech at NeurIPS 2024]

SubjECTive-QA offers a new means to evaluate financial discourse by characterizing language's subjective and multifaceted nature. This improves on traditional datasets that quantify sentiment or verify claims from financial statements.

The dataset consists of 2,747 Q&A pairs taken from 120 ECTs from companies listed on the New York Stock Exchange from 2007 to 2021. The Georgia Tech researchers annotated each response by hand based on six features for a total of 49,446 annotations.

The group evaluated answers on:

  • Relevance: the speaker answered the question with appropriate details.
  • Clarity: the speaker was transparent in the answer and the message conveyed.
  • Optimism: the speaker answered with a positive outlook regarding future outcomes.
  • Specificity: the speaker included sufficient and technical details in their answer.
  • Cautiousness: the speaker answered using a conservative, risk-averse approach.
  • Assertiveness: the speaker answered with certainty about the company’s events and outcomes.

The Georgia Tech group validated their dataset by training eight computer models to detect and score these six features. Test models comprised of three BERT-based pre-trained language models (PLMs), and five popular large language models (LLMs) including Llama and ChatGPT. 

All eight models scored the highest on the relevance and clarity features. This is attributed to domain-specific pretraining that enables the models to identify pertinent and understandable material.

The PLMs achieved higher scores on the clear, optimistic, specific, and cautious categories. The LLMs scored higher in assertiveness and relevance. 

In another experiment to test transferability, a PLM trained with SubjECTive-QA evaluated 65 Q&A pairs from White House press briefings and gaggles. Scores across all six features indicated models trained on the dataset could succeed in other fields outside of finance. 

"Building on these promising results, the next step for SubjECTive-QA is to enhance customer service technologies, like chatbots,” said Shah, a Ph.D. candidate studying machine learning. 

“We want to make these platforms more responsive and accurate by integrating our analysis techniques from SubjECTive-QA."

SubjECTive-QA culminated from two semesters of work through Georgia Tech’s Vertically Integrated Projects (VIP) Program. The VIP Program is an approach to higher education where undergraduate and graduate students work together on long-term project teams led by faculty. 

Undergraduate students earn academic credit and receive hands-on experience through VIP projects. The extra help advances ongoing research and gives graduate students mentorship experience.

Computer science major Huzaifa Pardawala and mathematics major Siddhant Sukhani co-led the SubjECTive-QA project with Shah. 

Fellow collaborators included Veer KejriwalAbhishek PillaiRohan BhasinAndrew DiBiasioTarun Mandapati, and Dhruv Adha. All six researchers are undergraduate students studying computer science. 

Sudheer Chava co-advises Shah and is the faculty lead of SubjECTive-QA. Chava is a professor in the Scheller College of Business and director of the M.S. in Quantitative and Computational Finance (QCF) program.

Chava is also an adjunct faculty member in the College of Computing’s School of Computational Science and Engineering (CSE).

"Leading undergraduate students through the VIP Program taught me the powerful impact of balancing freedom with guidance,” Shah said. 

“Allowing students to take the helm not only fosters their leadership skills but also enhances my own approach to mentoring, thus creating a mutually enriching educational experience.”

Presenting SubjECTive-QA at NeurIPS 2024 exposes the dataset for further use and refinement. NeurIPS is one of three primary international conferences on high-impact research in AI and ML. The conference occurs Dec. 10-15.

The SubjECTive-QA team is among the 162 Georgia Tech researchers presenting over 80 papers at NeurIPS 2024. The Georgia Tech contingent includes 46 faculty members, like Chava. These faculty represent Georgia Tech’s Colleges of Business, Computing, Engineering, and Sciences, underscoring the pertinence of AI research across domains. 

"Presenting SubjECTive-QA at prestigious venues like NeurIPS propels our research into the spotlight, drawing the attention of key players in finance and tech,” Shah said.

“The feedback we receive from this community of experts validates our approach and opens new avenues for future innovation, setting the stage for transformative applications in industry and academia.”

News Contact

Bryant Wine, Communications Officer
bryant.wine@cc.gatech.edu

CSE NeurIPS 2024
CSE NeurIPS 2024

A new machine learning (ML) model from Georgia Tech could protect communities from diseases, better manage electricity consumption in cities, and promote business growth, all at the same time.

Researchers from the School of Computational Science and Engineering (CSE) created the Large Pre-Trained Time-Series Model (LPTM) framework. LPTM is a single foundational model that completes forecasting tasks across a broad range of domains. 

Along with performing as well or better than models purpose-built for their applications, LPTM requires 40% less data and 50% less training time than current baselines. In some cases, LPTM can be deployed without any training data.

The key to LPTM is that it is pre-trained on datasets from different industries like healthcare, transportation, and energy. The Georgia Tech group created an adaptive segmentation module to make effective use of these vastly different datasets.

The Georgia Tech researchers will present LPTM in Vancouver, British Columbia, Canada, at the 2024 Conference on Neural Information Processing Systems (NeurIPS 2024). NeurIPS is one of the world’s most prestigious conferences on artificial intelligence (AI) and ML research.

“The foundational model paradigm started with text and image, but people haven’t explored time-series tasks yet because those were considered too diverse across domains,” said B. Aditya Prakash, one of LPTM’s developers. 

“Our work is a pioneer in this new area of exploration where only few attempts have been made so far.”

[MICROSITE: Georgia Tech at NeurIPS 2024]

Foundational models are trained with data from different fields, making them powerful tools when assigned tasks. Foundational models drive GPT, DALL-E, and other popular generative AI platforms used today. LPTM is different though because it is geared toward time-series, not text and image generation.  

The Georgia Tech researchers trained LPTM on data ranging from epidemics, macroeconomics, power consumption, traffic and transportation, stock markets, and human motion and behavioral datasets.

After training, the group pitted LPTM against 17 other models to make forecasts as close to nine real-case benchmarks. LPTM performed the best on five datasets and placed second on the other four.

The nine benchmarks contained data from real-world collections. These included the spread of influenza in the U.S. and Japan, electricity, traffic, and taxi demand in New York, and financial markets.   

The competitor models were purpose-built for their fields. While each model performed well on one or two benchmarks closest to its designed purpose, the models ranked in the middle or bottom on others.

In another experiment, the Georgia Tech group tested LPTM against seven baseline models on the same nine benchmarks in zero-shot forecasting tasks. Zero-shot means the model is used out of the box and not given any specific guidance during training. LPTM outperformed every model across all benchmarks in this trial.

LPTM performed consistently as a top-runner on all nine benchmarks, demonstrating the model’s potential to achieve superior forecasting results across multiple applications with less and resources.

“Our model also goes beyond forecasting and helps accomplish other tasks,” said Prakash, an associate professor in the School of CSE. 

“Classification is a useful time-series task that allows us to understand the nature of the time-series and label whether that time-series is something we understand or is new.”

One reason traditional models are custom-built to their purpose is that fields differ in reporting frequency and trends. 

For example, epidemic data is often reported weekly and goes through seasonal peaks with occasional outbreaks. Economic data is captured quarterly and typically remains consistent and monotone over time. 

LPTM’s adaptive segmentation module allows it to overcome these timing differences across datasets. When LPTM receives a dataset, the module breaks data into segments of different sizes. Then, it scores all possible ways to segment data and chooses the easiest segment from which to learn useful patterns.

LPTM’s performance, enhanced through the innovation of adaptive segmentation, earned the model acceptance to NeurIPS 2024 for presentation. NeurIPS is one of three primary international conferences on high-impact research in AI and ML. NeurIPS 2024 occurs Dec. 10-15.

Ph.D. student Harshavardhan Kamarthi partnered with Prakash, his advisor, on LPTM. The duo are among the 162 Georgia Tech researchers presenting over 80 papers at the conference. 

Prakash is one of 46 Georgia Tech faculty with research accepted at NeurIPS 2024. Nine School of CSE faculty members, nearly one-third of the body, are authors or co-authors of 17 papers accepted at the conference. 

Along with sharing their research at NeurIPS 2024, Prakash and Kamarthi released an open-source library of foundational time-series modules that data scientists can use in their applications.

“Given the interest in AI from all walks of life, including business, social, and research and development sectors, a lot of work has been done and thousands of strong papers are submitted to the main AI conferences,” Prakash said. 

“Acceptance of our paper speaks to the quality of the work and its potential to advance foundational methodology, and we hope to share that with a larger audience.”

News Contact

Bryant Wine, Communications Officer
bryant.wine@cc.gatech.edu

SMART USA logo

 The Department of Commerce has granted the Semiconductor Research Corporation (SRC), its partners, and Georgia Institute of Technology $285 million to establish and operate the 18th Manufacturing USA Institute. The Semiconductor Manufacturing and Advanced Reseach with Twins (SMART USA) will focus on using digital twins to accelerate the development and deployment of microelectronics. SMART USA, with more than 150 expected partner entities representing industry, academia, and the full spectrum of supply chain design and manufacturing, will span more than 30 states and have combined funding totaling $1 billion. 

This is the first-of-its-kind CHIPS Manufacturing USA Institute. 

“Georgia Tech’s role in the SMART USA Institute amplifies our trailblazing chip and advanced packaging research and leverages the strengths of our interdisciplinary research institutes,” said Tim Lieuwen, interim executive vice president for Research. “We believe innovation thrives where disciplines and sectors intersect. And the SMART USA Institute will help us ensure that the benefits of our semiconductor and advanced packaging discoveries extend beyond our labs, positively impacting the economy and quality of life in Georgia and across the United States.” 

The 3D Systems Packaging Research Center (PRC), directed by School of Electrical and Computer Engineering Dan Fielder Professor Muhannad Bakir, played an integral role in developing the winning proposal. Georgia Tech will be designated as the Digital Innovation Semiconductor Center (DISC) for the Southeastern U.S.  

“We are honored to collaborate with SRC and their team on this new Manufacturing USA Institute. Our partnership with SRC spans more than two decades, and we are thrilled to continue this collaboration by leveraging the Institute’s wide range of semiconductor and advanced packaging expertise,” said Bakir. 

Through the Institute of Matter and Systems’ core facilities, housed in the Marcus Nanotechnology Building, DISC will accelerate semiconductor and advanced packaging development. 

“The awarding of the Digital Twin Manufacturing USA Institute is a culmination of more than three years of work with the Semiconductor Research Corporation and other valued team members who share a similar vision of advancing U.S. leadership in semiconductors and advanced packaging,” said George White, senior director for strategic partnerships at Georgia Tech. 

“As a founding member of the SMART USA Institute, Georgia Tech values this long-standing partnership. Its industry and academic partners, including the HBCU CHIPS Network, stand ready to make significant contributions to realize the goals and objectives of the SMART USA Institute,” White added. 

 Georgia Tech also plans to capitalize on the supply chain and optimization strengths of the No. 1-ranked H. Milton Stewart School of Industrial and Systems Engineering (ISyE). ISyE experts will help develop supply-chain digital twins to optimize and streamline manufacturing and operational efficiencies. 

David Henshall, SRC vice president of Business Development, said, “The SMART USA Institute will advance American digital twin technology and apply it to the full semiconductor supply chain, enabling rapid process optimization, predictive maintenance, and agile responses to chips supply chain disruptions. These efforts will strengthen U.S. global competitiveness, ensuring our country reaps the rewards of American innovation at scale.”  

 

 

News Contact

Amelia Neumeister | Research Communications Program Manager

Leadership at the Memorandum of Understanding signing with the Korea Institute of Industrial Technology (KITECH).

Leadership at the Memorandum of Understanding signing with the Korea Institute of Industrial Technology (KITECH). From left to right: Sangpyo Suh, Consulate General of Korea in Atlanta; Chaouki Abdallah, former executive vice president of Research at Georgia Tech; Sang Mok Lee, president of KITECH; and Barton Lowrey, director of the Georgia Department of Economic Development.

Shreyes Melkote, associate director of the Georgia Tech Manufacturing Institute, signing the Memorandum of Understanding with the Korea Automotive Technology Institute.

Shreyes Melkote, associate director of the Georgia Tech Manufacturing Institute, signing the Memorandum of Understanding with the Korea Automotive Technology Institute.

Na-Seung Sik, president of the Korea Automotive Technology Institute, signing the Memorandum of Understanding with Georgia Tech at the Georgia Tech Manufacturing Institute.

Na-Seung Sik, president of the Korea Automotive Technology Institute, signing the Memorandum of Understanding with Georgia Tech at the Georgia Tech Manufacturing Institute.

In a significant step towards fostering international collaboration and advancing cutting-edge technologies in manufacturing, Georgia Tech recently signed Memorandums of Understanding (MoUs) with the Korea Institute of Industrial Technology (KITECH) and the Korea Automotive Technology Institute (KATECH). Facilitated by the Georgia Tech Manufacturing Institute (GTMI), this landmark event underscores Georgia Tech’s commitment to global partnerships and innovation in manufacturing and automotive technologies. 

“This is a great fit for the institute, the state of Georgia, and the United States, enhancing international cooperation,” said Thomas Kurfess, GTMI executive director and Regents’ Professor in the George W. Woodruff School of Mechanical Engineering (ME). “An MoU like this really gives us an opportunity to bring together a larger team to tackle international problems.” 

“An MoU signing between Georgia Tech and entities like KITECH and KATECH signifies a formal agreement to pursue shared goals and explore collaborative opportunities, including joint research projects, academic exchanges, and technological advancements,” said Seung-Kyum Choi, an associate professor in ME and a major contributor in facilitating both partnerships. “Partnering with these influential institutions positions Georgia Tech to expand its global footprint and enhance its impact, particularly in areas like AI-driven manufacturing and automotive technologies.” 

The state of Georgia has seen significant growth in investments from Korean companies. Over the past decade, approximately 140 Korean companies have committed around $23 billion to various projects in Georgia, creating over 12,000 new jobs in 2023 alone. This influx of investment underscores the strong economic ties between Georgia and South Korea, further bolstered by partnerships like those with KITECH and KATECH. 

“These partnerships not only provide access to new resources and advanced technologies,” says Choi, “but create opportunities for joint innovation, furthering GTMI’s mission to drive transformative breakthroughs in manufacturing on a global scale.”  

The MoUs with KITECH and KATECH are expected to facilitate a wide range of collaborative activities, including joint research projects that leverage the strengths of both institutions, academic exchanges that enrich the educational experiences of students and faculty, and technological advancements that push the boundaries of current manufacturing and automotive technologies. 

“My hopes for the future of Georgia Tech’s partnerships with KITECH and KATECH are centered on fostering long-term, impactful collaborations that drive innovation in manufacturing and automotive technologies,” Choi noted. “These partnerships do not just expand our reach; they solidify our leadership in shaping the future of manufacturing, keeping Georgia Tech at the forefront of industry breakthroughs worldwide.” 

Georgia Tech has a history of successful collaborations with Korean companies, including a multidecade partnership with Hyundai. Recently, the Institute joined forces with the Korea Institute for Advancement of Technology (KIAT) to establish the KIAT-Georgia Tech Semiconductor Electronics Center to advance semiconductor research, fostering sustainable partnerships between Korean companies and Georgia Tech researchers. 

“Partnering with KATECH and KITECH goes beyond just technological innovation,” said Kurfess, “it really enhances international cooperation, strengthens local industry, drives job creation, and boosts Georgia’s economy.” 

News Contact

Audra Davidson
Research Communications Program Manager
Georgia Tech Manufacturing Institute

Deven Desai and Mark Riedl

Deven Desai and Mark Riedl have seen the signs for a while. 

Two years since OpenAI introduced ChatGPT, dozens of lawsuits have been filed alleging technology companies have infringed copyright by using published works to train artificial intelligence (AI) models.

Academic AI research efforts could be significantly hindered if courts rule in the plaintiffs' favor. 

Desai and Riedl are Georgia Tech researchers raising awareness about how these court rulings could force academic researchers to construct new AI models with limited training data. The two collaborated on a benchmark academic paper that examines the landscape of the ethical issues surrounding AI and copyright in industry and academic spaces.

“There are scenarios where courts may overreact to having a book corpus on your computer, and you didn’t pay for it,” Riedl said. “If you trained a model for an academic paper, as my students often do, that’s not a problem right now. The courts could deem training is not fair use. That would have huge implications for academia.

“We want academics to be free to do their research without fear of repercussions in the marketplace because they’re not competing in the marketplace,” Riedl said. 

Desai is the Sue and John Stanton Professor of Business Law and Ethics at the Scheller College of Business. He researches how business interests and new technology shape privacy, intellectual property, and competition law. Riedl is a professor at the College of Computing’s School of Interactive Computing, researching human-centered AI, generative AI, explainable AI, and gaming AI. 

Their paper, Between Copyright and Computer Science: The Law and Ethics of Generative AI, was published in the Northwestern Journal of Technology and Intellectual Property on Monday.

Desai and Riedl say they want to offer solutions that balance the interests of various stakeholders. But that requires compromise from all sides.

Researchers should accept they may have to pay for the data they use to train AI models. Content creators, on the other hand, should receive compensation, but they may need to accept less money to ensure data remains affordable for academic researchers to acquire.

Who Benefits?

The doctrine of fair use is at the center of every copyright debate. According to the U.S. Copyright Office, fair use permits the unlicensed use of copyright-protected works in certain circumstances, such as distributing information for the public good, including teaching and research.

Fair use is often challenged when one or more parties profit from published works without compensating the authors.

Any original published content, including a personal website on the internet, is protected by copyright. However, copyrighted material is republished on websites or posted on social media innumerable times every day without the consent of the original authors. 

In most cases, it’s unlikely copyright violators gained financially from their infringement.

But Desai said business-to-business cases are different. The New York Times is one of many daily newspapers and media companies that have sued OpenAI for using its content as training data. Microsoft is also a defendant in The New York Times’ suit because it invested billions of dollars into OpenAI’s development of AI tools like ChatGPT.

“You can take a copyrighted photo and put it in your Twitter post or whatever you want,” Desai said. “That’s probably annoying to the owner. Economically, they probably wanted to be paid. But that’s not business to business. What’s happening with Open AI and The New York Times is business to business. That’s big money.”

OpenAI started as a nonprofit dedicated to the safe development of artificial general intelligence (AGI) — AI that, in theory, can rival human thinking and possess autonomy.

These AI models would require massive amounts of data and expensive supercomputers to process that data. OpenAI could not raise enough money to afford such resources, so it created a for-profit arm controlled by its parent nonprofit.

Desai, Riedl, and many others argue that OpenAI ceased its research mission for the public good and began developing consumer products. 

“If you’re doing basic research that you’re not releasing to the world, it doesn’t matter if every so often it plagiarizes The New York Times,” Riedl said. “No one is economically benefitting from that. When they became a for-profit and produced a product, now they were making money from plagiarized text.”

OpenAI’s for-profit arm is valued at $80 billion, but content creators have not received a dime since the company has scraped massive amounts of copyrighted material as training data.

The New York Times has posted warnings on its sites that its content cannot be used to train AI models. Many other websites offer a robot.txt file that contains instructions for bots about which pages can and cannot be accessed. 

Neither of these measures are legally binding and are often ignored.

Solutions

Desai and Riedl offer a few options for companies to show good faith in rectifying the situation.

  • Spend the money. Desai says Open AI and Microsoft could have afforded its training data and avoided the hassle of legal consequences.

    “If you do the math on the costs to buy the books and copy them, they could have paid for them,” he said. “It would’ve been a multi-million dollar investment, but they’re a multi-billion dollar company.”
     
  • Be selective. Models can be trained on randomly selected texts from published works, allowing the model to understand the writing style without plagiarizing. 

    “I don’t need the entire text of War and Peace,” Desai said. “To capture the way authors express themselves, I might only need a hundred pages. I’ve also reduced the chance that my model will cough up entire texts.”
     
  • Leverage libraries. The authors agree libraries could serve as an ideal middle ground as a place to store published works and compensate authors for access to those works, though the amount may be less than desired.

    “Most of the objections you could raise are taken care of,” Desai said. “They are legitimate access copies that are secure. You get access to only as much as you need. Libraries at universities have already become schools of information.”

Desai and Riedl hope the legal action taken by publications like The New York Times will send a message to companies that develop AI tools to pump the breaks. If they don’t, researchers uninterested in profit could pay the steepest price.

The authors say it’s not a new problem but is reaching a boiling point.

“In the history of copyright, there are ways that society has dealt with the problem of compensating creators and technology that copies or reduces your ability to extract money from your creation,” Desai said. “We wanted to point out there’s a way to get there.”

News Contact

Nathan Deen

 

Communications Officer

 

School of Interactive Computing

Camille Harris

The Automatic Speech Recognition (ASR) models that power voice assistants like Amazon Alexa may have difficulty transcribing English speakers with minority dialects.

A study by Georgia Tech and Stanford researchers compared the transcribing performance of leading ASR models for people using Standard American English (SAE) and three minority dialects — African American Vernacular English (AAVE), Spanglish, and Chicano English.

Interactive Computing Ph.D. student Camille Harris is the lead author of a paper accepted into the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) this week in Miami.

Harris recruited people who spoke each dialect and had them read from a Spotify podcast dataset, which includes podcast audio and metadata. Harris then used three ASR models — wav2vec 2.0, HUBERT, and Whisper — to transcribe the audio and compare their performances.

For each model, Harris found SAE transcription significantly outperformed each minority dialect. The models more accurately transcribed men who spoke SAE than women who spoke SAE. Members who spoke Spanglish and Chicano English had the least accurate transcriptions out of the test groups. 

While the models transcribed SAE-speaking women less accurately than their male counterparts, that did not hold true across minority dialects. Minority men had the most inaccurate transcriptions of all demographics in the study.

“I think people would expect if women generally perform worse and minority dialects perform worse, then the combination of the two must also perform worse,” Harris said. “That’s not what we observed. 

“Sometimes minority dialect women performed better than Standard American English. We found a consistent pattern that men of color, particularly Black and Latino men, could be at the highest risk for these performance errors.”

Addressing underrepresentation

Harris said the cause of that outcome starts with the training data used to build these models. Model performance reflected the underrepresentation of minority dialects in the data sets.

AAVE performed best under the Whisper model, which Harris said had the most inclusive training data of minority dialects.

Harris also looked at whether her findings mirrored existing systems of oppression. Black men have high incarceration rates and are one of the people groups most targeted by police. Harris said there could be a correlation between that and the low rate of Black men enrolled in universities, which leads to less representation in technology spaces.

“Minority men performing worse than minority women doesn’t necessarily mean minority men are more oppressed,” she said. “They may be less represented than minority women in computing and the professional sector that develops these AI systems.”

Harris also had to be cautious of a few variables among AAVE, including code-switching and various regional subdialects.

Harris noted in her study there were cases of code-switching to SAE. Speakers who code-switched performed better than speakers who did not. 

Harris also tried to include different regional speakers.

“It’s interesting from a linguistic and history perspective if you look at migration patterns of Black folks — perhaps people moving from a southern state to a northern state over time creates different linguistic variations,” she said. “There are also generational variations in that older Black Americans may speak differently from younger folks. I think the variation was well represented in our data. We wanted to be sure to include that for robustness.”

TikTok barriers

Harris said she built her study on a paper she authored that examined user-design barriers and biases faced by Black content creators on TikTok. She presented that paper at the Association of Computing Machinery’s (ACM) 2023 Conference on Computer Supported Cooperative Works. 

Those content creators depended on TikTok for a significant portion of their income. When providing captions for videos grew in popularity, those creators noticed the ASR tool built into the app inaccurately transcribed them. That forced the creators to manually input their captions, while SAE speakers could use the ASR feature to their benefit.

“Minority users of these technologies will have to be more aware and keep in mind that they’ll probably have to do a lot more customization because things won’t be tailored to them,” Harris said.

Harris said there are ways that designers of ASR tools could work toward being more inclusive of minority dialects, but cultural challenges could arise.

“It could be difficult to collect more minority speech data, and you have to consider consent with that,” she said. “Developers need to be more community-engaged to think about the implications of their models and whether it’s something the community would find helpful.”

News Contact

Nathan Deen

 

Communications Officer

 

School of Interactive Computing

Marcus Nanotechnology Building

The Institute for Matter and Systems (IMS) at Georgia Tech has announced the fall 2024 core facility seed grant recipients. The primary purpose of this program is to give graduate students in diverse disciplines working on original and unfunded research in micro- and nanoscale projects the opportunity to access the most advanced academic cleanroom space in the Southeast. In addition to accessing the labs' high-level fabrication, lithography, and characterization tools, the awardees will have the opportunity to gain proficiency in cleanroom and tool methodology and access the consultation services provided by research staff members in IMS. Seed Grant awardees are also provided travel support to present their research at a scientific conference.

In addition to student research skill development, this biannual grant program gives faculty with novel research topics the ability to develop preliminary data to pursue follow-up funding sources. The Core Facility Seed Grant program is supported in part by the Southeastern Nanotechnology Infrastructure Corridor (SENIC), a member of the National Science Foundation’s National Nanotechnology Coordinated Infrastructure (NNCI).

The five winning projects in this round were awarded IMS cleanroom and lab access time to be used over the next year. 

The Fall 2024 IMS Core Facility Seed Grant recipients are:

Manufacturing of a Diamagnetically Enhanced PEM Electrolysis Cell
PI: Alvaro Romero-Calvo
Student: Shay Vitale
Daniel Guggenheim School of Aerospace Engineering

Biomimicking Organ-On-a-Chip Models
PI: Nick Housley
Student: Aref Valipour
School of Biological Sciences                                                            

Single-shot LWIR Hyperspectral Imaging Using Meta-optics
PI: Shu Jia
Student: Jooyeong Yun (School of Electrical and Computer Engineering)
The Wallace H. Coulter Department of Biomedical Engineering

Large-area Three-dimensional Nanolithography Using Two-photon Polymerization
PI: Sourabh Saha
Student: Golnaz Aminaltojjari
George W. Woodruff School of Mechanical Engineering

Effects of Geochemical Constraints on the Redistribution of Rare Earth Elements (REE) during Chemical Weathering
PI: Yuanzhi Tang
Student: Hang Xu
School of Earth and Atmospheric Sciences

 

Members of Georgia AIM’s governance team stand for a photo with Cassia Baker, a cybersecurity expert with the Georgia Manufacturing Extension Partnership (left), and David Bridges, executive vice president of Georgia Tech’s Enterprise Innovation Institute (second from right), which oversees the projects.

Members of Georgia AIM’s governance team stand for a photo with Cassia Baker, a cybersecurity expert with the Georgia Manufacturing Extension Partnership (left), and David Bridges, executive vice president of Georgia Tech’s Enterprise Innovation Institute (second from right), which oversees the projects.

Georgia AIM (Artificial Intelligence in Manufacturing) was recently awarded the 'Tech for Good' award from the Technology Association of Georgia (TAG), the state’s largest tech organization.

The accolade was presented at the annual TAG Technology Awards ceremony on Nov. 6 at Atlanta’s Fox Theatre. The TAG Technology Awards promote inclusive technology throughout Georgia, and any state company, organization, or leader is eligible to apply.

Tech for Good, one of TAG’s five award categories, honors a program or project that uses technology to promote inclusiveness and equity by serving Georgia communities and individuals who are underrepresented in the tech space.

Georgia AIM is comprised of 16 projects across the state that connect smart technology to manufacturing through K-12 education, workforce development, and manufacturer outreach. The federally funded program is a collaborative project administered through Georgia Tech’s Enterprise Innovation Institute and the Georgia Tech Manufacturing Institute.

TAG is a Georgia AIM partner and provides workforce development programs that train people and assist them in making successful transitions into tech careers.

Donna Ennis, Georgia AIM’s co-director, accepted the award on behalf of the organization.

“Georgia AIM’s mission is to equitably develop and deploy talent and innovation for AI in manufacturing, and the Tech for Good Award reinforces our focus on revolutionizing the manufacturing economy for Georgia and the entire country,” Ennis said in her acceptance speech.

She cited the organization’s many coalition members across the state: the Technical College System of Georgia; Spelman College; the Georgia AIM Mobile Studio team at the Russell Innovation Center for Entrepreneurs and the University of Georgia; the Southwest Georgia Regional Commission; the Georgia Cyber Innovation & Training Center; and TAG and Georgia AIM’s partners in the Middle Georgia Innovation corridor, including 21st Century Partnership and the Houston Development Authority.

Ennis also acknowledged the U.S. Economic Development Administration for funding the project and helping to bring it to fruition. “But most of all,” she said, “I want to thank our manufacturers and communities across Georgia who are at the forefront of creating a new economy through AI in manufacturing. It is a privilege to assist you on this journey of technology and discovery.”

 

News Contact

Eve Tolpa

Glycine, one of the critical amino acids that the system coverts carbon dioxide into. (Image Credit: NASA)

Glycine, one of the critical amino acids that the system coverts carbon dioxide into. (Image Credit: NASA)

Professor Pamela Peralta-Yahya, lead corresponding author of the study.

Professor Pamela Peralta-Yahya, lead corresponding author of the study.

Ph.D. Student Shaafique Chowdhury, first author of the study.

Ph.D. Student Shaafique Chowdhury, first author of the study.

Ph.D. Student Ray Westerberg

Ph.D. Student Ray Westerberg

“Part of what makes a cell-free system so efficient,” Westenberg says, “is that it can use cellular enzymes without needing the cells themselves. By generating the enzymes and combining them in the lab, the system can directly convert carbon dioxide into the desired chemicals.”

“Part of what makes a cell-free system so efficient,” Westenberg says, “is that it can use cellular enzymes without needing the cells themselves. By generating the enzymes and combining them in the lab, the system can directly convert carbon dioxide into the desired chemicals.”

Amino acids are essential for nearly every process in the human body. Often referred to as ‘the building blocks of life,’ they are also critical for commercial use in products ranging from pharmaceuticals and dietary supplements, to cosmetics, animal feed, and industrial chemicals. 

And while our bodies naturally make amino acids, manufacturing them for commercial use can be costly — and that process often emits greenhouse gasses like carbon dioxide (CO2).

In a landmark study, a team of researchers has created a first-of-its kind methodology for synthesizing amino acids that uses more carbon than it emits. The research also makes strides toward making the system cost-effective and scalable for commercial use. 

“To our knowledge, it’s the first time anyone has synthesized amino acids in a carbon-negative way using this type of biocatalyst,” says lead corresponding author Pamela Peralta-Yahya, who emphasizes that the system provides a win-win for industry and environment. “Carbon dioxide is readily available, so it is a low-cost feedstock — and the system has the added bonus of removing a powerful greenhouse gas from the atmosphere, making the synthesis of amino acids environmentally friendly, too.”

The study, “Carbon Negative Synthesis of Amino Acids Using a Cell-Free-Based Biocatalyst,” published today in ACS Synthetic Biology, is publicly available. The research was led by Georgia Tech in collaboration with the University of Washington, Pacific Northwest National Laboratory, and the University of Minnesota.

The Georgia Tech research contingent includes Peralta-Yahya, a professor with joint appointments in the School of Chemistry and Biochemistry and School of Chemical and Biomolecular Engineering (ChBE); first author Shaafique Chowdhury, a Ph.D. student in ChBE; Ray Westenberg, a Ph.D student in Bioengineering; and Georgia Tech alum Kimberly Wennerholm (B.S. ChBE ’23).

Costly chemicals

There are two key challenges to synthesizing amino acids on a large scale: the cost of materials, and the speed at which the system can generate amino acids.

While many living systems like cyanobacteria can synthesize amino acids from CO2, the rate at which they do it is too slow to be harnessed for industrial applications, and these systems can only synthesize a limited number of chemicals.

Currently, most commercial amino acids are made using bioengineered microbes. “These specially designed organisms convert sugar or plant biomass into fuel and chemicals,” explains first author Chowdhury, “but valuable food resources are consumed if sugar is used as the feedstock — and pre-processing plant biomass is costly.” These processes also release CO2 as a byproduct.

Chowdhury says the team was curious “if we could develop a commercially viable system that could use carbon dioxide as a feedstock. We wanted to build a system that could quickly and efficiently convert CO2 into critical amino acids, like glycine and serine.”

The team was particularly interested in what could be accomplished by a ‘cell-free’ system that leveraged some process of a cellular system — but didn’t actually involve living cells, Peralta-Yahya says, adding that systems using living cells need to use part of their CO2 to fuel their own metabolic processes, including cell growth, and have not yet produced sufficient quantities of amino acids.

“Part of what makes a cell-free system so efficient,” Westenberg explains, “is that it can use cellular enzymes without needing the cells themselves. By generating the enzymes and combining them in the lab, the system can directly convert carbon dioxide into the desired chemicals. Because there are no cells involved, it doesn’t need to use the carbon to support cell growth — which vastly increases the amount of amino acids the system can produce.”

A novel solution

While scientists have used cell-free systems before, one of the necessary chemicals, the cell lysate biocatalyst, is extremely costly. For a cell-free system to be economically viable at scale, the team needed to limit the amount of cell lysate the system needed.

After creating the ten enzymes necessary for the reaction, the team attempted to dilute the biocatalyst using a technique called ‘volumetric expansion.’ “We found that the biocatalyst we used was active even after being diluted 200-fold,” Peralta-Yahya explains. “This allows us to use significantly less of this high-cost material — while simultaneously increasing feedstock loading and amino acid output.”

It’s a novel application of a cell-free system, and one with the potential to transform both how amino acids are produced, and the industry’s impact on our changing climate. 

“This research provides a pathway for making this method cost-effective and scalable,” Peralta-Yahya says. “This system might one day be used to make chemicals ranging from aromatics and terpenes, to alcohols and polymers, and all in a way that not only reduces our carbon footprint, but improves it.”

 

Funding: Advanced Research Project Agency-Energy (ARPA-E), U.S. Department of Energy and the U.S. Department of Energy, Office of Science, Biological and Environmental Research Program.

DOI: 10.1021/acssynbio.4c00359

News Contact

Written by Selena Langner