Mapping USC Research to the SDGs

Motivation

Our goal of this project was to classify USC-affiliated research to the UN Sustainable Development Goals (SDGs) in order to:

  1. Create a central research directory (via an interactive dashboard) where USC students, postdocs and faculty can connect with USC scholars based on shared interests.
  2. Track USC’s progress on its goal to increase its interdisciplinary research and training for sustainability and climate change solutions (see USC’s Asgmt: Earth Research Goals).
  3. Improve USC’s STARS (Sustainability Tracking, Assessment & Rating System) Rating from Silver to Gold in our next submission in 2024. a. One of USC’s low-scoring areas in 2021 was related to data gaps in sustainability research metrics (USC’s 2021 score for the AC-9 credit)
  4. Create a framework for other institutions to use to conduct similar SDG mapping projects and to accelerate their STARS reporting process for the AC-9 credit.

Assignment Earth Research UN's 17 SDGs

Problem

What method should we use to classify USC publications by the UN SDGs?

We used and compared two methods:

  1. Use an mBert Machine Learning Model (Aurora University)
  2. Use Scopus Query results (Elsevier)

How should we display the results for an optimized user experience?

Created a dashboard using Rshiny based on the Scopus query results

Data Collection

Data for Testing Aurora’s Machine Learning Program

Created a curated list of publications with assigned primary and secondary SDG

Data for Creating an Interactive Research Dashboard

Downloaded 2020-2022 USC affiliated publications from Scopus (Sign in with USC’s VPN to access)

Methods

Machine Learning

We tested Aurora’s Machine Learning mBert Model on:

  1. Our manually curated dataset (10 pubs/SDG)
  2. The output of the scopus query results

RShiny Dashboard

We created an interactive dashboard using RShiny:

Results

Machine Learning

  1. Output from Aurora’s mBert model on our manually curated dataset:
    • In our manually curated data set we assigned primary and secondary SDGs for each publication since publications can be mapped to multiple SDGs.
    • Let the predicted SDG with the highest probability be the predicted primary SDG.
    • 54% accuracy when comparing the predicted primary SDG with the primary SDG that we manually assessed for the publication
    • 77% accuracy when comparing the predicted primary SDG with any of the secondary SDGs that we manually assessed for the publication
    • 92% accuracy when comparing any of the predicted SDGs from the model with any of our manually assessed SDGs for a given publication.
    • See output below for detailed ML results for each separate SDG class. Note- balanced accuracy is different than the actual accuracy of the model output compared to the actual SDGs (as noted above). Confusion Matrix part 1Confusion Matrix part 2
  2. Output from Aurora’s mBert model on the Scopus Query Results
    • In progress (currently crashing our laptops, data = 26,335 rows).

RShiny Dashboard

Discussion

What we did

What the Results Imply

Improvements that can be made

What we learned

Learned to use data tools!