ProQuest TDM Studio: Data Visualizations

TDM Studio is ProQuest’s text and data mining platform. This platform enables researchers to create datasets using licensed ProQuest content and analyze those datasets by running Python or R scripts in an accompanying Jupyter Notebook. A component of TDM Studio is Data Visualizations, a platform for researchers with little to no coding experience. Data Visualizations is accessible to any member of the Mason community.

The first feature of Data Visualizations is geographic analysis. Geographic analysis maps articles based on locations named within the articles. All articles come from publications GMU subscribes to through ProQuest, and includes such titles as the New York Times, Washington Post, and Los Angeles Times. Users can create up to five unique projects, and each project can contain up to 10,000 articles. Other methodologies, including topic modeling, sentiment analysis, and ngram/term frequency, are currently in development, as is an export data functionality.

Accessing ProQuest TDM Studio: Data Visualizations

  1. Go to tdmstudio.proquest.com/home.
  2. Click log in to TDM Studio.
  3. Click create an account. Enter your GMU email address; read through the terms of use, and check the box if you consent; click create account.
  4. You will receive an email to your GMU account. Click the link in the email to confirm the creation of your account.
  5. Begin using Data Visualizations.

Using ProQuest TDM Studio: Data Visualizations

  1. Once you’re logged in, you will be directed to the visualization dashboard. From the dashboard you are able to manage and interact with your projects using analysis methods. You can create up to five projects. Click create new project to begin.
  2. In the search box, enter your search terms. The following publications will be searched for the specified terms: Chicago Tribune; Washington Post; Wall Street Journal; New York Times; The Globe and Mail; Los Angeles Times; The Guardian; Sydney Morning Herald; South China Morning Post; and Times of India.
  3. You will refine the dataset. Only 10,000 documents can be analyzed per project. You can refine your search by publication; date published; and document type. Click next: review project when finished.
  4.  You will be given a summary of your project that includes document count; publications; and the selected analysis. Enter a descriptive project name and click create project.
  5. A dialog box will appear that says your project was successfully submitted. Once you close the message you will be redirected to the visualization dashboard.
  6. Your project will take time to generate. You will see the name; date range; search query; count; publications; and analysis method listed for your newly created project. Once the project has been successfully generated, you can click show actions. Click delete if you wish to delete the project. Click open geographic visualization to open the visualization.
  7. The geographic visualization will open with a global view. The top menu includes the project name; the number of articles in the project; date range; and an option to export data.
  8. Click on a cluster or map marker. A drawer will open on the right hand side of the screen that includes the articles included in that cluster or marker. The larger the cluster, the longer it will take for the drawer to open. If you click on the title of an article, a new tab or window will open and you will be directed to the article itself. Click hide articles to close the drawer.
  9. You can use the slider along the bottom of the map to change the date range. The map will update to reflect the new date range.
  10. Later updates to the platform will enable users to export data.

Want to try out Data Visualizations? Follow the link listed above, or you can find Data Visualizations on the Libraries’ A-Z Database list.

If you have questions about Data Visualizations or TDM Studio, contact the Digital Scholarship Center (DiSC) at datahelp@gmu.edu.

2020 ICPSR Data Fair

ICPSR Data FairSeptember 21-25, 2020
All free, all virtual, all open to the public

With all the unexpected twists and turns of 2020, the ICPSR Data Fair will provide a data lens on timely topics such as the elections, Black Lives Matter, the Census, higher education, immigration, COVID-19, and so much more.

Data Fair Schedule

Register via this registration form. Attendees will register once for the full Data Fair and will receive links to all presentations as part of your attendee materials.

Important to know:

  • All presentations take place via Zoom. Links to all presentations will be sent directly to registrants the week prior to the Data Fair.
  • Participants who attend five or more presentations will receive a Certificate of Completion.
  • Participants who attend ten or more presentations will be featured on the Data Fair website.

Getting Remote Assistance from DiSC (Spring 2020)

Even with the university closure and other uncertainties, DiSC staff continues to provide assistance to the Mason community.

Advice

DiSC has many online InfoGuides with information about the most common issues people encounter. These will continue to be available and updated.

As always, you can get assistance with any of your data needs by emailing datahelp@gmu.edu. We typically reply within 1 business day.

Consultations

DiSC consultants continue to be available for appointments, but all will be conducted virtually. The university has a subscription to WebEx which enables video chat and screensharing.

Computer Lab

See below for advice on access to the most popular software. If you need additional assistance, please contact datahelp@gmu.edu. We can also suggest alternative solutions using free and open software.

Access to Data

Almost all library resources, including data, are available from off campus. For subscription sources, be sure to use links to the databases through the library website such as Subject List or the A-Z List so that you will be asked to log in.

Use our InfoGuides on finding data for access to popular data resources. Here are two good ones to start with:

Find Data & Statistics: Best Places to Start – useful for accessing sources for looking up statistics on a topic or building a quick table.

Find Data for Analysis – useful for finding datasets and data sources for data analysis projects.

Again, if you can’t find what you need, ask us.

Software Access

The university has a Virtual Computing Lab that provides access to some of the software through Microsoft’s Remote Desktop application (PC and Mac). With additional needs, it may not be as available as it has been. In addition, many have difficulty connecting to and using it. Those using Stata tend to be the most satisfied with it.

It may be best for students to install or have individual access to software. Here are some options for student access to the most popular software, as well as some alternatives that should be considered. The purchase column lists the lowest cost option.

Love Data Week Workshops Feb 10-Feb 14

To celebrate Love Data Week (February 10 – February 14, 2020) the Digital Scholarship Center (DiSC) will be running a series of workshops focusing on getting started with R, python, GIS, text analysis, using secondary data, and managing data projects. The workshops will take place in the DiSC Lab, Room 2701A Fenwick Library. All are welcome to attend these workshops regardless of skill level. Registration is strongly encouraged. Click on the time links below to register.

Working With and Analyzing Secondary Data – Monday, February 10, 1:00 PM and 5:00 PM
Using Voyant for Text Analysis – Tuesday, February 11, 1:00 PM and 5:00 PM
R/Python: How and Why to Get Started – Wednesday, February 12, 1:00 PM and 5:00 PM
OSF 101: Introduction to the Open Science Framework – Thursday, February 13, 1:00 PM and 5:00 PM
Introduction to GIS and Mapping – Friday, February 14 at 1:00 PM

On Monday, February 10 at 1 PM and 5 PM, Wendy Mann will lead a workshop on Working With and Analyzing Secondary Data. She will discuss how to acquire, review, and analyze secondary data. Participants will learn how to prepare this kind of data for analysis and bring it into a statistical package. Reviewing datasets and documentation will also be covered.

Learn how to Use Voyant for Text Analysis on Tuesday, February 11 at 1 PM and 5 PM. Alyssa Fahringer will provide an overview of Voyant, showcase projects that utilize the platform, and discuss use cases. She will walk attendees through how to upload and explore a corpus in Voyant as well as how to embed and export your data.

On Wednesday, February 12 at 1 PM and 5 PM, Debby Kermer will go over How and Why to Get Started with R and Python. She will cover when those languages should be used, what to know about them prior to getting started, and resources for learning them. The final half hour of the workshop will be devoted to answering questions and assisting with software installation and hands-on learning.

Come for an Introduction to the Open Science Framework on Thursday, February 13 at 1 PM and 5 PM. Margaret Lam and Carl Leak will discuss how to navigate and create projects on the Open Science Framework (OSF). Attendees will learn how to reproduce research practices, track activity, and use Templates and Forks in OSF to make new projects.

On Friday, January 14 at 1 PM, Joy Suh will lead an Introduction to GIS and Mapping. Participants will learn the basics of visualizing geographic information and creating maps in a GIS. She will talk about how to understand geospatial data, where to find mapping source data, and how to use ESRI ArcGIS. Additionally, attendees will learn how to read and interpret maps and data, and how to use cartographic principles to create maps for presentations and publications.