Sample Selector

Sample Selector is a tool for creating and editing samples, or groups of data you compare across—they're not "samples" in the statistical sense, but more like filters.

By default, a single sample exists: "All Data". With the Sample Selector, you can create new samples to organize your data.

You can use samples to:

A sample is composed of one or more filters, specific conditions that narrow down your sample.

Creating a sample

The general process for creating a sample is to:

The effect of multiple filters

DataShop interprets each filter after the first as an additional restriction on the data that is included in the sample. This is also known as a logical "AND". You can see the results of multiple filters in the sample preview as soon as all filters are "saved".

Help

What can I do with DataShop?

I'm a

  

and I want to ...

add a research goal

Description added on IE8. Amended on IE7. Amended on IE11.

Show related datasets and papers

improve my tutoring software

Selection of test cases for regression testing depends more on the criticality of defect fixes than the criticality of the defect itself. A minor defect can result in major side effects and a bug fix for an extreme defect can have no or minor side effects. The test engineer needs to balance these aspects for selecting the test cases for regression testing.

Show related datasets and papers

computer-based assessment, build or test a model for

Educational technology data can be used for accurate assessment of student proficiency, both conceptual and procedural. Feng et al. (2009) provide a great example of how accurate assessment can be achieved while students are learning from an on-line tutor and, in fact, dynamic learning data enhances prediction. A model built from on-line interactions predicts standardized test scores with a correlation of over 0.8!

There are plenty of further opportunities for exploring the quality of online interaction for assessment. Datasets that also have attached pre- or post-test data are particularly good candidates for this goal (see below).

Projects with pre- and post-test data attached

Perfetti - Read Write Integration
Perceptual Fluency in Geometry Achievement
Robust learning with a Meta-Cognitive Tutor
Intelligent Writing Tutor
Geometry Cognitive Model Discovery Closing-the-Loop
Teachable Peer Learner

Show related datasets and papers

improve student learning in my system

There are many ways DataShop can help you analyze your dataset to try to discover ways you might improve student learning from your system. First, a simple strategy is to inspect learning curves to see if any are "low and flat", implying students are getting asked to do easy tasks repeatedly, potentially wasting their valuable learning time (see Cen et al., 2007).

A second, more sophisticated approach is to inspect your learning curves to identify opportunities for improving your knowledge component (KC) model. See Stamper et al. (2011) and watch either of these two videos. Koedinger et al. (2013) describes how an improved KC model was used to redesign a tutor and describes an experiment showing that students learn faster and better from this redesigned tutor than they do from the original tutor. Koedinger & McLaughlin (2010) provides a similar result, with both showing how KC model improvements can inspire the design of novel instructional tasks.

A third, automated approach is to employ Learning Factors Analysis (LFA; see Koedinger et al., 2012). If you would like us to apply LFA to your dataset, contact us. There are many other ways researchers have improved their systems and run experiments demonstrating that these improvements work. See the topic Test an instructional principle.

Watch a video in which researchers Martina Rau and Richard Scheines discuss sense-making before fluency. Using data collected from fourth and fifth grade students who used an intelligent tutoring system for fractions learning, Rau et al. were able to determine that an instructional model that emphasizes making sense of a fractions concept using graphical representation before demonstrating fluency in using graphical representations produces significantly enhanced learning gains. For their award-winning EDM 2013 conference paper, see Rau et al. (2013).

Show related datasets and papers

test an instructional principle

Show related datasets and papers

analyze data from another system to get ideas

Show related datasets and papers

analyze process data from experiment

Many hypotheses on learning are tested through in vivo experimentation with data stored in DataShop. Within DataShop, users can create samples on subsets of data and compare different conditions within the data. When separate samples are created for experimental conditions, selecting them all will yield learning curves for each and performance profiler data charts for each.

You can see examples of the kinds of analyses that researchers have performed by clicking on the show related datasets and papers link below and reading one of those papers. For example, MacLaren et al. (2008) show results of analyzing process data to see if experimental conditions produce different patterns of hint requests (Table 5) or produce different amounts of example study or problem solving (Table 6). One way to do such an analysis is to export the dataset from the Export tab. You may want to export one of the smaller "rollup" exports, like the student-problem rollup or the student-step rollup, which give you higher level summary data. You can open the export in your favorite tool, such as R or Excel (e.g., use a pivot table with condition in the rows, Knowledge Component in columns, and average of hints in the cells).

Error rates, times, and hints can also be viewed by condition in learning curves or the performance profiler by creating samples for each condition and selecting those. An example dataset that has condition samples is Digital Games for Improving Number Sense - Study 1 (on the Learning Curve tab, inspect the two existing samples and try turning them on and off).

Show related datasets and papers

applications of Bayesian modeling

There are a number of places where Bayesian modeling can be observed or enhanced using datasets in DataShop. Bayesian Knowledge Tracing (BKT) has been studied extensively in the context of cognitive tutors. Several notable works include Baker's work on estimating slip and guess parameters (Baker et al., 2008) and Koedinger et al. (2011), work exploring student thrashing in knowledge component mastery. DataShop includes an external tool provided by Michael Yudelson that provides a BKT fitting algorithm which is described in this video tutorial.

Show related datasets and papers

detecting motivation or engagement

A number of researchers in the fields of educational data mining and learning analytics focus on affective states of students. Using DataShop, it is possible to detect a student's level of motivation or engagement by looking at patterns in the data. Baker and colleagues have built models for detecting behaviors such as when students were gaming the system (Baker et al., 2008) or off task (Cocea et al., 2009). Often the additional data needed to create these tutors is human tagged at first and then built into a model using EDM and machine learning techniques (Baker & Carvallo, 2008) This additional information can easily be imported into and stored in DataShop using custom fields.

Show related datasets and papers

determine the grain size of transfer of learning

Show related datasets and papers

discovering knowledge component/skill/cognitive/student models

Show related datasets and papers

explore student collaboration data

While much of the data in DataShop is from individual use of tutors, online courses, games, etc., there are a number of datasets that include or involve some form of student collaboration. We not only encourage the addition of more such datasets, but also more secondary analyses of existing datasets. Two projects in DataShop with such data are Fractions Collaboration and Individual Data and Rummel - Improving Algebra Learning and Collaboration.

Note: If you are not finding what you are looking for, do not hesitate to ask us.

Show related datasets and papers

HTML goal for 7.1.13

Show related datasets and papers

modeling the rate of learning.

Some fundamental cognitive and educational psychology questions that analysis of DataShop data could help answer are:

1) How "fast" do human's learn?
2) What is the shape of the "learning curve" (e.g., Chi et al., 2011)?
3) Are there individual student differences in the rate of learning (cf., Yudelson et al., 2013)?

There has been significant research on question 2 using reaction time as the measure of performance (e.g., Heathcote et al., 2000, cited below), but insufficient investigation of the shape of the learning curve when the performance measure is error rate (an arguably more relevant variable for educational goals).

Pursuing any of these questions in the near term is quite likely to lead to a valuable (and publishable!) scientific contribution.

Heathcote, A., Brown, S., and D.J.K., M. 2000. The power law repealed: The case for an exponential law of practice. Psychonomic Bulletin and Review 7, 2, 185207.

Show related datasets and papers

multiple skills

You might have data from problems or activities where the student skills or knowledge components (KCs) are tagged in multiples such that a single student answer or response may require multiple skills of KCs. This is not an unusual situation, and one that DataShop can handle. When building and importing a KC model, you may make additional columns with the same KC headings to show multiple KCs on a student step.

Multiple skills can present challenges when trying to track individual skills as blame assignment becomes an issue. Such is the case in the Koedinger et al. (2011) where multi-skill-assigned steps led to a problem selection thrashing issue where some students could not get past a set of problems because of incorrect blame assignment in the skill model. VanLehn et al. (2005) discussed issues related to multiple skills in the Andes physics intelligent tutor. They found it better to isolate individual skills. This area of investigation (multiple skills) deserves more attention and is ripe for scientific investigation and progress.

Show related datasets and papers

predicting student performance

Show related datasets and papers

test a model of metacognition

A number of researchers have built models of aspects of metacognitive (or "learning to learn") strategies that have been influenced by and/or tested with datasets in DataShop. If you are considering creating a new model of a metacognitive strategy, take a look at Aleven & Koedinger (2002) for an example of analyzing data to identify limitations in students' metacognitive (or self-regulatory learning) behaviors. Such limitations suggest opportunities for modeling desired metacognitive behavior (e.g., Aleven et al., 2004) and for developing tutoring support at the metacognitive level (e.g., Roll et al., 2007). Looking across multiple DataShop datasets, one can test for long-term effects of a metacognitive intervention (e.g., Roll et al., 2011).

In addition to exploring metacognitive help-seeking strategies (as in references above), other metacognitive strategies have also been explored including self-explanation (e.g., Shi et al., 2008), error self-correction (e.g., Mathan & Koedinger, 2005), self-assessment (e.g., Long & Aleven, 2013), and collaboration skills (e.g., Walker et al., 2011). Many more are possible!

Show related datasets and papers

test a theory of motivation

Show related datasets and papers

test a theory of performance or learning

If, for example, you want to test whether a power law or exponential function better fits learning data, you might use DataShop data sets to do so as follows. You might export data from a dataset, e.g. Geometry Area, 1996-1997, open it into a software package like Matlab or R, and use programs for modeling, such as generalized linear regression, to compare alternate versions of your theory. You can find instructions on how to read an exported file into R here.

Show related datasets and papers

test my data mining method on multiple data sets

If you built a model for detecting a specific affective state, you can test your detector across multiple datasets. Baker et al. (2006) applied a detector of gaming the system to a number of DataShop datasets. Koedinger et al. (2012) (EDM 2012 Best Paper) showed an automated technique of improving knowledge component models across 11 different data sets. You can also use web services to connect to DataShop which facilitates running your own analyses on multiple datasets.

Show related datasets and papers