(This was my first version. See here for a new version that scores correct answers to quizzes that the instructor gives via Zoom’s poll feature, rather than merely using poll data to check on attendance.)
This do-file takes attendance and poll reports produced by Zoom and merges them into a student roster. For each Zoom lecture it determines, for each student, the total number of minutes spent logged into Zoom, the number of log-ins, and whether the student submitted at least one answer to Zoom polls. It then uses this information to calculate a simple 1-or-0 score for attendance that day. It also records attendance at weekly sections, which take place on various days. (It does not judge whether students’ poll responses were “correct” or not, just whether any were submitted.)
It uses the student’s email address to match Zoom reports with the roster. It will only work well if students are required to log into Zoom with the same email address that is in the roster file. To help check for situations where this is not the case, it generates a log file, “unknown_email.txt”.
Stata will not modify the original roster file but will generate two new files (or over-write existing files) containing the output: “roster_with_attendance.dta” and “roster_with_attendance.xlsx”, in Stata and Excel format.
The code assumes that zoomdata.do is in the working directory, which also has:
A folder named “rosters” which has in it “roster.xlsx”. This is your master roster for the course, containing columns whose first-row cells have the words “Email Address” and “Section” respectively. This Excel file may also have any number of other columns, e.g. for students’ names, etc.
A folder named “attend_lec”, containing one or more .csv files that come from Zoom, each a “report” from a class meeting. These .csv files should be named in some numerical or alphabetical sequence so that they will sort chronologically, e.g. “class01.csv” “class02.csv”, etc. There should be no .csv files in the folder other than these report files. Stata will extract the class date from the date-stamps in these files.
A folder named “poll_reports”, containing one or more .csv files that come from Zoom, each a “report” from a class meeting. Zoom does not seem to reliably record dates in such files so you must embed the month and date in each filename as in the following example: poll04_07.csv
A folder named “attend_sec”. The .csv files here are like those for lecture attendance above but each is for one section, rather than the whole class, and the sections take place on different days. They should be named “week01_A.csv”, “week01_B.csv” “week02_A.csv”, etc., for an arbitrary number of weeks (up to 99) and section names A, B, etc. Stata will compile these into a variable for number of minutes of section attended per week.
I wrote this for Stata 14.2. Comments welcome. (Updated 4/18/2020)
* zoomdata.do // Merge data from Zoom attendance and poll reports into student roster
**************************************
***** Merge lecture attendance reports
**************************************
*** Constant
local min_minutes_lec = 75 // student is considered to have attended lecture if present on Zoom for this many minutes or more
*** Initiate log file for logins from unknown emails
file open uemail using unknown_email.txt, replace write text
file write uemail "Logins found for these email addresses that are not in the master roster:" _newline
file close uemail
*** Import master roster
set more off
clear
import excel using "rosters/roster.xlsx", firstrow case(lower)
ren emailaddress email
* The TA responsible for each student
gen ta = section
label var ta "Student's teaching assistant is"
order ta, after(section)
replace ta = "(TA name)" if ta == "01A"
replace ta = "(TA name)" if ta == "01B"
replace ta = "(TA name)" if ta == "01C"
replace ta = "(TA name)" if ta == "01D"
* Teaching assistants, who show up in Zoom data
gen byte is_ta = 0
label var is_ta "Is a teaching assistant"
order section id is_ta, first
local newobs = _N + 1
set obs `newobs'
replace firstname = "(TA name)" in `newobs'
replace lastname = "(TA name)" in `newobs'
replace email = "(TA email)" in `newobs'
replace is_ta = 1 in `newobs'
local newobs = _N + 1
set obs `newobs'
replace firstname = "(TA name)" in `newobs'
replace lastname = "(TA name)" in `newobs'
replace email = "(TA email)" in `newobs'
replace is_ta = 1 in `newobs'
save "rosters/roster_with_attendance.dta", replace
*** Identify source files in lecture attendance folder
cd attend_lec
global files: dir . files "*.csv" // Get a list of all .csv files in current folder
global files_sorted: list sort global(files) // Sort them
*** Iterate over source files
foreach f of global files_sorted {
* Import and basic cleaning of one source file
di _n "Processing: " as result "`f'"
clear
import delimited using "`f'", ///
varnames(1) encoding("utf-8")
ren durationminutes minutes
label var minutes "Duration (minutes)"
ren alternativescore attention
label var attention "Attention score"
replace email = strlower(email) if email != strlower(email)
* Optionally, disregard instructors' email addresses
drop if email == "bread@ucsc.edu"
* Processing of one source file
drop attention // Doesn't seem to vary
drop name // not needed
gen date = ustrleft(jointime,10)
label var date "Date"
drop jointime leavetime
local date_prefix = substr(date[1],1,2) + "_" + substr(date[1],4,2)
drop date
local lecture_dates = "`lecture_dates'" + "`date_prefix'" + " "
sort email
by email: egen lec`date_prefix'_mins = total(minutes)
label var lec`date_prefix'_mins "Total minutes attending lecture"
by email: gen lec`date_prefix'_logins = _N
label var lec`date_prefix'_logins "Number of log-ins"
by email: keep if _n==1 // keeps one row for each email address
drop minutes // because have already captured total
save temp_`date_prefix'.dta, replace
use "../rosters/roster_with_attendance.dta", clear
merge 1:1 email using temp_`date_prefix'.dta
* Log emails that are not found in the master roster
count if _merge==2
local nonmatches = r(N)
if `nonmatches' > 0 {
file open uemail using ../unknown_email.txt, write append text
forvalues i = 1/`=_N' {
local flag = (_merge[`i']==2)
if `flag' == 1 {
file write uemail "`f' " (email[`i']) _newline
}
}
file close uemail
} // end if loop
drop if _merge==2
drop _merge
sort is_ta lastname firstname
save "../rosters/roster_with_attendance.dta", replace
erase temp_`date_prefix'.dta
} // end of "foreach file" loop
recode *_mins *_logins (. = 0)
cd .. // return to main folder
save "rosters/roster_with_attendance.dta", replace
*************************************************
***** Merge Zoom poll reports into student roster
*************************************************
* Note: Zoom's poll output is inconsistent in whether it reports date
* Need to put a date-stamp in filename of input files.
set more off
*** Identify source files in poll reports folder
cd poll_reports
global files: dir . files "*.csv" // Get a list of all .csv files in current folder
global files_sorted: list sort global(files) // Sort them
*** Iterate over source files
foreach f of global files_sorted {
* Import and basic cleaning of one source file
di _n "Processing: " as result "`f'"
clear
import delimited using "`f'", varnames(1) encoding("utf-8")
* import delimited using poll03.csv, varnames(1) encoding("utf-8")
ren useremail email
keep email // disregarding the content of the answers
replace email = strlower(email) if email != strlower(email)
* Processing of one source file
* Processing of date has to be different from attendance report because it's not consistently included
* So, take from filename
local date_prefix = substr("`f'",5,5)
sort email
by email: gen lec`date_prefix'_answers = _N
label var lec`date_prefix'_answers "Number of poll answers"
by email: keep if _n==1 // keeps one row for each email address
save temp_`date_prefix'.dta, replace
use "../rosters/roster_with_attendance.dta", clear
merge 1:1 email using temp_`date_prefix'.dta
* Log emails that are not found in the master roster
count if _merge==2
local nonmatches = r(N)
if `nonmatches' > 0 {
file open uemail using ../unknown_email.txt, write append text
forvalues i = 1/`=_N' {
local flag = (_merge[`i']==2)
if `flag' == 1 {
file write uemail "`f' " (email[`i']) _newline
}
}
file close uemail
} // end if loop
drop if _merge==2
drop _merge
sort is_ta lastname firstname
save "../rosters/roster_with_attendance.dta", replace
erase temp_`date_prefix'.dta
} // end of "foreach file" loop
cd .. // return to main folder
recode *_answers (. = 0)
order *_mins *_logins *_answers, sequential
order section-email, first
save "rosters/roster_with_attendance.dta", replace
**********************************************************
***** Merge section attendance reports into student roster
**********************************************************
*** Identify source files in section attendance folder
cd attend_sec
global files: dir . files "*.csv" // Get a list of all .csv files in current folder
global files_sorted: list sort global(files) // Sort them
*** Iterate over source files
foreach f of global files_sorted {
* Import and basic cleaning of one source file
di _n "Processing: " as result "`f'"
clear
import delimited using "`f'", varnames(1) encoding("utf-8")
keep email durationminutes
ren durationminutes minutes
label var minutes "Duration (minutes)"
replace email = strlower(email) if email != strlower(email)
* Processing week # from filename
local week_prefix = substr("`f'",5,2)
sort email
by email: egen sec`week_prefix'_smins = total(minutes)
label var sec`week_prefix'_smins "Total minutes attending section"
by email: keep if _n==1 // keeps one row for each email address
drop minutes
save temp_`week_prefix'.dta, replace
use "../rosters/roster_with_attendance.dta", clear
merge 1:1 email using temp_`week_prefix'.dta, update
* Log emails that are not found in the master roster
count if _merge==2
local nonmatches = r(N)
if `nonmatches' > 0 {
file open uemail using ../unknown_email.txt, write append text
forvalues i = 1/`=_N' {
local flag = (_merge[`i']==2)
if `flag' == 1 {
file write uemail "`f' " (email[`i']) _newline
}
}
file close uemail
} // end if loop
drop if _merge==2
drop _merge
sort is_ta lastname firstname
save "../rosters/roster_with_attendance.dta", replace
erase temp_`week_prefix'.dta
} // end of "foreach file" loop
cd ..
recode *_smins (. = 0)
order *_mins *_logins *_answers *_smins, sequential
order section-email, first
*****************************************************
* Create variable marking attendance for each lecture
* based on minutes online and poll participation
*****************************************************
foreach i in `lecture_dates' {
gen byte lec`i'_attend = 0
capture confirm variable lec`i'_answers // check whether there was a poll that day
if !_rc { // if there WAS a poll that day
replace lec`i'_attend = 1 if ///
lec`i'_mins >= `min_minutes_lec' ///
& lec`i'_mins < . ///
& lec`i'_answers > 0 ///
& lec`i'_answers < .
}
else { // if there WASN'T a poll that day
replace lec`i'_attend = 1 if ///
lec`i'_mins >= `min_minutes_lec' ///
& lec`i'_mins < .
}
label var lec`i'_attend "Attended for minimum # of minutes"
}
egen missed_classes = anycount(*_smins *_attend), values(0)
label var missed_classes "Total # of missed classes"
save "rosters/roster_with_attendance.dta", replace
export excel using "rosters/roster_with_attendance.xlsx", replace
*** Useful commands:
* browse ta first last *_smins *_attend if missed_classes > 4
* list ta first last if missed_classes > 4, clean noobs