Loading Churros

organizing thoughts...

WEEK 4: DATA AND DATA ANALYSIS

← check out more reflections

Task Objective: Data collection

For this week's workshop, our group was tasked with Scenario 2: University-led Data Collection. In this simulation, we act as an external data analysis company commissioned by the university. Our brief is to audit and analyze the students' "digital engagement" to help the university improve its student services. For the task, we must design and pilot a survey using Microsoft Forms, that extracts actionable insights for the university without compromising student anonymity.

Our Objective (Self-Initiated Work)

Build a survey and share it to all the students in the class to collect data, which will later be visualized and used for analysis.

Data is the new oil: but how do we drill it? (Possler et al., 2019)

Data sets are the backbone of modern analytics, acting as vast reservoirs of information that empower researchers, analysts, and data scientists to derive meaningful insights. These structured collections of data, whether big or small, serve as the raw material for uncovering patterns, trends, and correlations that drive informed decision-making. Each data set is a unique snapshot, capturing a specific aspect of reality, and when meticulously curated and analyzed, it becomes a powerful tool for understanding complex phenomena. In the ever-expanding landscape of information, data sets stand as the building blocks upon which innovation, research, and progress are constructed, offering a lens through which we can gain a deeper understanding of our world.

Challenges: This task goes beyond simply asking questions. We had to ask the right questions. We needed to craft questions that prompted honest reflections on AI usage habits. It was imperative that we moved away from the "policing" of academic misconduct by using AI. Specifically, we had to look at how students are actually using Generative AI in their studies, not to catch them cheating, but to understand the "gaps" in university support.

Survey Question Brainstorm

Figure 1: A snapshot of our collaborative Word document on which we brainstormed questions for the survey.

Collecting data the correct way

Privacy-by-design: The "validity" of our data relies on student trust. If there is even a hint that their responses could be traced back to them, the data becomes corrupted by fear. Our task is to ensure the survey settings and the questions themselves are "privacy-by-design."

Demographics: It seemed intrusive and unnecessary to ask the students for their name, phone number/mail ID and gender for such a survey. However, we asked for age and international status, not viewing them as identifiers, we viewed them as support indicators. If the data shows that international students are disproportionately using "Live Translation" tools, the positive outcome isn't to ban them, but for the University to officially license better translation software to ensure equity.

Using the Likert scale strategically: By placing Grammarly, a socially accepted tool, next to ChatGPT, often stigmatized in academia, in the Likert scale, we are subtly testing the boundaries of what students consider "cheating." This distinction is vital for the University. If students see "Live Translation" as essential for accessibility but "Copilot" as academic dishonesty, the University needs to tailor its support policies differently for each tool, rather than issuing a blanket ban on AI.

Mapping the crisis points: If students admit they use AI mostly for "Source-finding" or "Summarizing," it reveals a gap in the curriculum. So the university library can take action to offer better workshops on research skills, or lecturers can provide better reading lists. The data acts as a diagnostic tool for where the course is currently failing to support students.

Survey Questions 1-2

Figure 2.1: A snapshot of the first two questions on our survey.

Survey Questions 3-5

Figure 2.2: A snapshot of questions 3, 4 and 5 on our survey.

Survey Final Questions

Figure 2.3: A snapshot of the final questions on our survey.

Reflecting on DATA COLLECTION

Since the start of the twenty-first century, data collection practices have increasingly disregarded the need for consent. Data compilers started assuming they had the right to harvest online content without requiring agreements or ethics reviews, ultimately leading to more ethically troubling methods of extraction. For example, one professor at the University of Colorado Springs secretly installed a camera on campus to photograph over 1,700 students and staff, using these unapproved images to train his personal facial recognition technology (Crawford, 2021).

While this example represents the extreme of unethical surveillance, it served as a stark warning for our own group project. Even on a smaller scale, collecting this data made us consider so many important questions surrounding consent and ethics.

References

  • Crawford, K. 2021. Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. Yale University Press.
  • Possler, D., Bruns, S. and Niemann-Lenz, J. 2019. Data Is the New Oil—But How Do We Drill It? Pathways to Access and Acquire Large Data Sets in Communication Science. International Journal of Communication. 13, pp.3894–3911.