Data Structure Guidelines • VOX Analysis

Introduction

The voxanalysis package relies on standardized data sets to power the VOX Analysis application, as well as supporting calculations and data visualizations.

There are four specific data sets that users will interact with, either directly or indirectly. These include:

Response - Response entries for the speaker, organized by response type, referent, and evaluation date.
Speaker - General information about the speaker (name, date of birth, etc.).
Summary - An aggregate of the Response data set, where a total count of positive responses by response type grouped by evaluation date.
Upload & Export - A combination of Response and Speaker. It is used by the application for uploading and exporting data.

For VOX Analysis application users (non-R users), Upload & Export is the data structure to familiarize. (You can download this .csv file for example.) The application will export data that follows this structure. If a user wants to modify or add data directly to this data set, they must ensure they follow the structure outlined in this page. Otherwise, the application will reject any attempt to upload the data set.

For R users, Response, Summary, and Speaker are useful data structures to familiarize. Nearly every function for variance calculations and data visualizations use these structures.

See Data Definitions to understand keywords found in this page.

Response Data

The Response data set provides speaker responses from the evaluation. Each entry is a binary indicator (1 or 0) that shows whether the speaker responded to a given referent for a particular response type. The data set may contain multiple evaluations, distinguished by date_of_evaluation.

Column Name	Description
date_of_evaluation	The date of the evaluation
referent	The object a listener interacts with during the evaluation
referent_order	The order the referent appeared in the session. Required for data visualizations and calculations regarding specific verbal episodes
conversing	Binary indicator for conversational response (1 = Yes, 0 = No)
labeling	Binary indicator for labeling the referent (1 = Yes, 0 = No)
echoing	Binary indicator for echoing the referent’s name (1 = Yes, 0 = No)
requesting	Binary indicator for requesting the referent’s name (1 = Yes, 0 = No)

When used as a parameter in voxanalysis functions, this data set is called df_input_response.

A sample data set can be generated with:

library(voxanalysis)
data("df_input_response_example")

See ?df_input_response_example for more information.

Speaker Data

The Speaker data set provides key personal information about the speaker, such as name and date of birth.

Column Name	Description
first_name	The speaker’s first name
last_name	The speaker’s last name
date_of_birth	The speaker’s date of birth
language_spoken	The language spoken during the evaluation (optional)
gender	The speaker’s gender (optional)

When used as a parameter in voxanalysis functions, this data set is called df_input_speaker_info.

A sample data set can be generated with:

library(voxanalysis)
data("df_input_speaker_info_example")

See ?df_input_speaker_info_example for more information.

Summary Data

The Summary data set provides an aggregated total for each response type (Conversing, Labeling, Echoing, and Requesting). These totals are grouped by evaluation date. The data set is often used as an input for plot functions and other analyses to assess patterns in response types over time.

Column Name	Description
date_of_evaluation	The date of the evaluation
conversing	Binary indicator for conversational response (1 = Yes, 0 = No)
labeling	Binary indicator for labeling the referent (1 = Yes, 0 = No)
echoing	Binary indicator for echoing the referent’s name (1 = Yes, 0 = No)
requesting	Binary indicator for requesting the referent’s name (1 = Yes, 0 = No)

When used as a parameter in voxanalysis functions, this data set is called df_summarized_response.

The summary data set can be generated by passing df_input_response to the util_summarize_response function:

library(voxanalysis)

# Load example data
data("df_input_response_example")

# Summarize responses by type
util_summarize_response(
  df_input_response = df_input_response_example
)

A sample data set can be generated with:

library(voxanalysis)
data("df_summarized_response_example")

See ?df_summarized_response_example and ?util_summarize_response for more information.

Upload & Export Data

The Upload & Export data set consolidates information about the speaker and their responses. It merges the Speaker and Response data set together, for ease-of-export by the application user.

Column Name	Description
first_name	The speaker’s first name
last_name	The speaker’s last name
date_of_birth	The speaker’s date of birth
language_spoken	The language spoken during the evaluation (optional)
gender	The speaker’s gender (optional)
date_of_evaluation	The date of the evaluation
referent	The object a listener interacts with during the evaluation
referent_order	The order the referent appeared in the session. Required for data visualizations and calculations regarding specific verbal episodes.
conversing	Binary indicator for conversational response (1 = Yes, 0 = No)
labeling	Binary indicator for labeling the referent (1 = Yes, 0 = No)
echoing	Binary indicator for echoing the referent’s name (1 = Yes, 0 = No)
requesting	Binary indicator for requesting the referent’s name (1 = Yes, 0 = No)

Note: application users MUST follow this structure when modifying .csv downloads. The application exports this data in a way that it can be re-uploaded. This gives the user the ability to append new data overtime. If a user modifies column names, column order, or adds ill-formatted entries, the application will reject the .csv file when the user uploads it.

When used as a parameter in voxanalysis functions, this data set is called df_output_exportable. You can also download this .csv file for example.

A sample data set can be generated with:

library(voxanalysis)
data("df_output_exportable_example")

For more details, see ?df_output_exportable_example.