Motivation

Our goal of this project was to classify USC-affiliated research to the UN Sustainable Development Goals (SDGs) in order to:

Create a central research directory (via an interactive dashboard) where USC students, postdocs and faculty can connect with USC scholars based on shared interests.
Track USC’s progress on its goal to increase its interdisciplinary research and training for sustainability and climate change solutions (see USC’s Asgmt: Earth Research Goals).
Improve USC’s STARS (Sustainability Tracking, Assessment & Rating System) Rating from Silver to Gold in our next submission in 2024. a. One of USC’s low-scoring areas in 2021 was related to data gaps in sustainability research metrics (USC’s 2021 score for the AC-9 credit)
Create a framework for other institutions to use to conduct similar SDG mapping projects and to accelerate their STARS reporting process for the AC-9 credit.

Assignment Earth Research UN's 17 SDGs

Problem

What method should we use to classify USC publications by the UN SDGs?

We used and compared two methods:

Use an mBert Machine Learning Model (Aurora University)
Use Scopus Query results (Elsevier)

How should we display the results for an optimized user experience?

Created a dashboard using Rshiny based on the Scopus query results

Data Collection

Data for Testing Aurora’s Machine Learning Program

Created a curated list of publications with assigned primary and secondary SDG

10 publications per SDG, including ‘0’ (for research not related to the SDGs)

Data for Creating an Interactive Research Dashboard

Downloaded 2020-2022 USC affiliated publications from Scopus (Sign in with USC’s VPN to access)

Selected USC affiliated publications: University of Southern California, Keck School of Medicine of USC, USC Norris Comprehensive Cancer Center, Los Angeles County USC Medical Center, Information Sciences Institute, USC School of Pharmacy, USC Marshall School of Business, Herman Ostrow School of Dentistry of USC, USC Ethel Percy Andrus Gerontology Center, “Women’s Hospital, Los Angeles”, USC Gould School of Law, Keck Hospital of USC, Keck Medicine of USC
Used Elsevier’s 2022 search queries for SDG 1 to 16
Downloaded selected columns
Designated SDG 0 to all the USC publications that didn’t match to the SDGs

Methods

Machine Learning

We tested Aurora’s Machine Learning mBert Model on:

Our manually curated dataset (10 pubs/SDG)
The output of the scopus query results

RShiny Dashboard

We created an interactive dashboard using RShiny:

Found USC authors in each publication by parsing the data, using Scopus API and web scraping the USC faculty directory
Adapted from R code used for the USC Curriculum Mapping Dashboard, authored by Brian Tinsley and Dr. Julie Hopper

Results

Machine Learning

Output from Aurora’s mBert model on our manually curated dataset:
- In our manually curated data set we assigned primary and secondary SDGs for each publication since publications can be mapped to multiple SDGs.
- Let the predicted SDG with the highest probability be the predicted primary SDG.
- 54% accuracy when comparing the predicted primary SDG with the primary SDG that we manually assessed for the publication
- 77% accuracy when comparing the predicted primary SDG with any of the secondary SDGs that we manually assessed for the publication
- 92% accuracy when comparing any of the predicted SDGs from the model with any of our manually assessed SDGs for a given publication.
- See output below for detailed ML results for each separate SDG class. Note- balanced accuracy is different than the actual accuracy of the model output compared to the actual SDGs (as noted above).
Output from Aurora’s mBert model on the Scopus Query Results
- In progress (currently crashing our laptops, data = 26,335 rows).

RShiny Dashboard

Discussion

What we did

Run Aurora’s ML model to categorize publications
Create RShiny Dashboard

What the Results Imply

Using Elsevier search queries
- there are alot of SDG 3 USC publications
- Half the USC publications are not related to any SDG

Improvements that can be made

Correction of false negatives (cases when USC scholars are not mapping to SDGs when in fact they conduct research related to specific SDGs)
Investigation of false positives and other SDG mapping errors. Followed by corrections to improve dashboard output accuracy.
We could investigate the accuracy between Aurora mBert ML output versus the Scopus Query results in RShiny Dashboard. Then for each publication we display a graph displaying probabilities.
Improve performance of RShiny Dashboard
- Large number of dropdown options
- Improve graphics

What we learned

Learned to use data tools!

R, RShiny, git
Deep learning models (Aurora’s mBert program)

Mapping USC Research to the SDGs

Contents

Motivation

Problem

Data Collection

Methods

Machine Learning

RShiny Dashboard

Results

Machine Learning

RShiny Dashboard

Discussion

What we learned