About
Downloads
- Cheat Sheet
- Data Management Plan Resources
- Logging Format (XSD/DTD)
- Logging Libraries
- Web Services
- XML Validator Tool
DataShopa data analysis service for the learning science community |
||||
|
About / Frequently Asked Questions (FAQ)
Updated May 29, 2012
Table of Contents
What is DataShop?
The Pittsburgh Science of Learning Center (PSLC) DataShop is the world's preeminent
central repository for data on the interactions between students and educational software
and a suite of tools to analyze that data. It provides secure data storage as well as an
array of exploratory analysis and visualization tools available through a web-based
interface.
How do I access DataShop?
DataShop access is free. You can access DataShop by going to: http://pslcdatashop.web.cmu.edu. DataShop
supports both InCommon and Google SSO. On the
login page you can
choose to authenticate with either your university or Google account. After
authenticating, if you've never logged in to DataShop you will be asked to create
a free account. No information we collect will be distributed to third parties. For
more information on accessing DataShop, see
our help topic on the subject.
What are the capabilities of DataShop?
DataShop can store many types of data associated with online courses and
learning-science studies. The analysis and visualization tools are particularly well-suited
for click-stream data from interactive learning environments such as intelligent tutoring
systems and virtual labs. In addition, you can store related publications, files,
presentations, or electronic artifacts.
What can DataShop do for me?
DataShop facilitates data representation and collection, and exploratory analysis.
Toward collecting data in a uniform format, we have developed a standard XML logging format
and two logging libraries (one in Flash ActionScript, the other in Java) to write this XML.
Data can also be imported using a similar tab-delimited format. After importing data or
logging the data to the DataShop database, the DataShop web application can help you start
exploratory data analysis with tools for common learning science analyses. You can also
export data for further manipulation and analysis in other tools.
Researchers have utilized DataShop to explore learning issues in a variety of educational domains. These include, but are not limited to, collaborative problem solving in Algebra (Rummel, Spada, Diziol, 2007), self-explanation in Physics (Hausmann & VanLehn , 2007), the effectiveness of worked examples and polite language in a Stoichiometry tutor (McLaren, Lim, Yaron, & Koedinger, 2007) and the optimization of knowledge component learning in Chinese (Pavlik, Presson, & Koedinger , 2007). I want to do x. What dataset should I use?
Contact
us with some information about your goals and we'll do our best to recommend a dataset.
If your goal is to just explore DataShop, see the list of recommended datasets at the top of
the dataset list after logging in.
What statistical support is available in DataShop?
The statistical support directly available in the DataShop is limited to statistics on
learning curves and knowledge component models. However, you can export the data to a file and
use your favorite statistical software package.
Datashop integrates the results of the AFM (Additive Factor Model) algorithm, a logistic regression performed over the “error rate” learning curve data. The AFM logistic regression, a standard regression bounded between 0 and 1, attempts to find the best-fit curve for error-rate data, which also ranges between 0 and 1. The results of this model are shown as the "predicted learning curve" on each line graph of student error rate. The predicted learning curves are the average predicted error of a skill over each of the learning opportunities. The Model Values page in DataShop presents a quantitative analysis of how well, given the selected knowledge component model, the AFM statistical model fits the data (via AIC, BIC, log likelihood) and how well it might generalize to an independent dataset from the same tutor (via cross validation RMSE). For more on the Additive Factor Model, see Is Over Practice Necessary? Improving Learning Efficiency with the Cognitive Tutor through Educational Data Mining (Cen, Koedinger, and Junker 2007). What format is the data in? In what format can I get the data out?
DataShop accepts data according to the Tutor
Message format. Data can come in as XML or tab-delimited text. Once processed, the data
is stored in a relational database. Data can be exported to a tab-delimited text file.
What kind of data gets logged?
Primarily, DataShop stores data on learner interactions with online course and study
materials that include intelligent tutors, virtual labs, simulations, and games. We have plans of storing more
types of data (e.g., audio and video data, writing samples) in the future.
Data is collected from the seven PSLC courses (Algebra, Chemistry, Chinese, English, French, Geometry and Physics) and various studies. There are also sources external to the PSLC that contribute to DataShop, such as middle school math data from the Assistment project at WPI. How do I get my data into DataShop?
The best method for getting your data into DataShop depends on the state of your project.
If you are developing a course or study and have not yet collected data, then you probably want to log student-tutor transactions as they occur. The page Logging New Data describes a number of scenarios where you would log data from your course or study to DataShop. If you are interested in storing and viewing data from a course or study that has occurred in the past, then you probably want to import the existing data. The page Importing New Data describes the two main types of data that can be imported into DataShop: XML files and tab-delimited text files. Can I use DataShop data for my own research purpose?
You do not need permission to view or use public data sets; they are freely accessible
to any researcher in the world. For private data sets, if you are the PI or have permission
from the PI, you may examine the data sets and use them in your own research.
To gain access to private data sets, first create an account (see “How do I access DataShop?” above), then visit the Other Datasets tab and click the “Request Access” button next to the name of the project you would like to access. In the dialog that appears, enter a brief reason for why you would like access. The request for access will be sent to the project's principal investigator and data provider (if one exists). The status of your request will be shown on the Access Requests page. Any projects for which you have been given access will appear on the My Datasets page. If you're not sure what data you need, please contact us and we'll do our best to help. I ran a LearnLab study. Who has access to my data, and how do I control access?
The principal investigator of a LearnLab study has full control over his/her own data.
With a new data set, only the PI has access to the data. We might not know you're
the PI, so please tell us! Other users of DataShop may request access to your data set; it's up
to you who receives access. A user can have view access to the data set, or edit access,
which allows the changing of data set metadata and adding or removing papers and files.
How do I get or create custom queries, analyses, or reports?
If you have a general feature or change in mind, we encourage you to contact us. In the
past, a number of reports and modifications to DataShop have started this way. If the
analysis is specific to your project and unlikely to benefit others, however, you might be
better off exporting the data from DataShop and performing the analysis in another program
such as SPSS, R, or Excel. (For instance, many kinds of reports can be generated from Excel
if you know how to use features like Pivot Tables and Auto Filter.) The line between these
two categories of analyses isn't always clear, so don't hesitate to start a dialogue with us
regarding your needs.
What is the time frame between completing a study and getting data in/from DataShop?
The time frame varies depending primarily on the source of the data. Tutors which log
directly to the PSLC server are moved into the DataShop’s database daily. For this
reason, we encourage you to develop tutors using
CTAT, which can log data to
the PSLC server for you.
Tutors which produce log data but do not log directly to the PSLC server, such as Andes (Physics LearnLab) or the Carnegie Learning Cognitive Tutors (Algebra and Geometry LearnLabs) must go through a collection and conversion process. The length of this process depends on the availability of the personnel to collect and anonymize the data, as well as the state of the program needed to run the conversion. Also note that conversion of extremely large datasets can add time. If you need a dataset urgently, please contact us and put "urgent" in the subject of the email. What restrictions are there on publishing about another researcher's data?
As long as proper IRB rules and guidelines have been followed and you have access to the data
through DataShop, you may publish an analysis you have conducted on another researcher's data. You must
acknowledge the source of the data in your publication. Additional information is available on our
Citing Datashop and Datasets help page.
What is the relationship between DataShop, Cognitive Tutor Authoring Tools
(CTAT), and the Open Learning Initiative (OLI)? Back to top
The three projects—DataShop, CTAT, and
OLI—are often in communication with one another
and in some cases build on each other's technology. CTAT is a research project at CMU that creates tools
for building intelligent tutors. OLI, also a CMU
project, researches and builds open and free online courses. In short, any tutor created
with CTAT has logging functionality built-in and can create data in the format DataShop
accepts, so we often recommend you use CTAT if you're developing a new intelligent tutor or
application. CTAT tutors can log directly to DataShop, decreasing the amount of time between
when your students use the tutors and when you can view your data in DataShop.
I'm testing a CTAT tutor that should be logging but I don't see any log data in DataShop. Why not?
Although troubleshooting depends on lots of specifics, here are some general things to
check:
For troubleshooting logging from CTAT tutors, see a few pages on the CTAT website: Troubleshooting logging from Flash tutors and Logging from Java. Also, don't hesitate to contact the CTAT team. Where can I get more help?
DataShop documentation is online at http://pslcdatashop.web.cmu.edu/help
You can also subscribe to the DataShop users email list or email the DataShop team. |
||||
|
||||