
Data Structure Guidelines
datamodel.Rmd
Introduction
The voxanalysis
package relies on standardized data sets
to power the VOX Analysis application, as well as supporting
calculations and data visualizations.
There are four specific data sets that users will interact with, either directly or indirectly. These include:
- Response - Response entries for the speaker, organized by response type, referent, and evaluation date.
- Speaker - General information about the speaker (name, date of birth, etc.).
- Summary - An aggregate of the Response data set, where a total count of positive responses by response type grouped by evaluation date.
- Upload & Export - A combination of Response and Speaker. It is used by the application for uploading and exporting data.
For VOX Analysis application users (non-R users), Upload & Export is the data structure to familiarize. (You can download this .csv file for example.) The application will export data that follows this structure. If a user wants to modify or add data directly to this data set, they must ensure they follow the structure outlined in this page. Otherwise, the application will reject any attempt to upload the data set.
For R users, Response, Summary, and Speaker are useful data structures to familiarize. Nearly every function for variance calculations and data visualizations use these structures.
See Data Definitions to understand keywords found in this page.
Response Data
The Response data set provides speaker responses from the evaluation.
Each entry is a binary indicator (1 or 0) that shows whether the speaker
responded to a given referent for a particular response type. The data
set may contain multiple evaluations, distinguished by
date_of_evaluation
.
Column Name | Description |
---|---|
date_of_evaluation | The date of the evaluation |
referent | The object a listener interacts with during the evaluation |
referent_order | The order the referent appeared in the session. Required for data visualizations and calculations regarding specific verbal episodes |
conversing | Binary indicator for conversational response (1 = Yes, 0 = No) |
labeling | Binary indicator for labeling the referent (1 = Yes, 0 = No) |
echoing | Binary indicator for echoing the referent’s name (1 = Yes, 0 = No) |
requesting | Binary indicator for requesting the referent’s name (1 = Yes, 0 = No) |
When used as a parameter in voxanalysis
functions, this
data set is called df_input_response
.
A sample data set can be generated with:
library(voxanalysis)
data("df_input_response_example")
See ?df_input_response_example
for more information.
Speaker Data
The Speaker data set provides key personal information about the speaker, such as name and date of birth.
Column Name | Description |
---|---|
first_name | The speaker’s first name |
last_name | The speaker’s last name |
date_of_birth | The speaker’s date of birth |
language_spoken | The language spoken during the evaluation (optional) |
gender | The speaker’s gender (optional) |
When used as a parameter in voxanalysis
functions, this
data set is called df_input_speaker_info
.
A sample data set can be generated with:
library(voxanalysis)
data("df_input_speaker_info_example")
See ?df_input_speaker_info_example
for more
information.
Summary Data
The Summary data set provides an aggregated total for each response type (Conversing, Labeling, Echoing, and Requesting). These totals are grouped by evaluation date. The data set is often used as an input for plot functions and other analyses to assess patterns in response types over time.
Column Name | Description |
---|---|
date_of_evaluation | The date of the evaluation |
conversing | Binary indicator for conversational response (1 = Yes, 0 = No) |
labeling | Binary indicator for labeling the referent (1 = Yes, 0 = No) |
echoing | Binary indicator for echoing the referent’s name (1 = Yes, 0 = No) |
requesting | Binary indicator for requesting the referent’s name (1 = Yes, 0 = No) |
When used as a parameter in voxanalysis
functions, this
data set is called df_summarized_response
.
The summary data set can be generated by passing
df_input_response
to the
util_summarize_response
function:
library(voxanalysis)
# Load example data
data("df_input_response_example")
# Summarize responses by type
util_summarize_response(
df_input_response = df_input_response_example
)
A sample data set can be generated with:
library(voxanalysis)
data("df_summarized_response_example")
See ?df_summarized_response_example
and
?util_summarize_response
for more information.
Upload & Export Data
The Upload & Export data set consolidates information about the speaker and their responses. It merges the Speaker and Response data set together, for ease-of-export by the application user.
Column Name | Description |
---|---|
first_name | The speaker’s first name |
last_name | The speaker’s last name |
date_of_birth | The speaker’s date of birth |
language_spoken | The language spoken during the evaluation (optional) |
gender | The speaker’s gender (optional) |
date_of_evaluation | The date of the evaluation |
referent | The object a listener interacts with during the evaluation |
referent_order | The order the referent appeared in the session. Required for data visualizations and calculations regarding specific verbal episodes. |
conversing | Binary indicator for conversational response (1 = Yes, 0 = No) |
labeling | Binary indicator for labeling the referent (1 = Yes, 0 = No) |
echoing | Binary indicator for echoing the referent’s name (1 = Yes, 0 = No) |
requesting | Binary indicator for requesting the referent’s name (1 = Yes, 0 = No) |
Note: application users MUST follow this structure when modifying .csv downloads. The application exports this data in a way that it can be re-uploaded. This gives the user the ability to append new data overtime. If a user modifies column names, column order, or adds ill-formatted entries, the application will reject the .csv file when the user uploads it.
When used as a parameter in voxanalysis
functions, this
data set is called df_output_exportable
. You can also download this .csv file
for example.
A sample data set can be generated with:
library(voxanalysis)
data("df_output_exportable_example")
For more details, see ?df_output_exportable_example
.