Friday, January 28, 2011

Two more data structures...

At long last, we are done "piloting" our materials for the interviews. I conducted two interviews yesterday at GHS - real deal. A sad commentary that pulling kids out of class for 30 mins means I get many more volunteers then an offer of lunch! (DP - the teachers says - so long as their brains are engaged in some hard thinking, he's fine with me pulling people out of class). In any case, two kind souls R and J agreed to come.

R was quick - and organized all the information in a sectioned fashion. Kind of hierarchical but not strictly organized in the nested table fashion that we are thinking of. He created a partitioning of the space by categorical variables that were "above" the vehicle in some sense and then proceeded to code the vehicle type and speed together for all vehicles. He then did a second pass through both segments and added the information about the distance from the preceding vehicle. He had some way to answer all questions but did not go so far as to set up a two way correlation for the last two questions. The flat data structure for him appeared to be more organized and "detailed"  - not that he had missing values but the information was "clearer" he said.

J favoured words over numbers. "I like to describe in words rather in numbers" - his own words. The recording took longer mainly because he was narrating the information and needed more time to write it all out. He initially made the mistake of not recording the vehicle type and recorded all the vehicles as cars. Since the rest of his information was complete, I hazard that this had more to do with our colloquial tendency to refer to any vehicle as a "car", than it had to do with considering an attribute unimportant. This meant that he was unable to answer two of the questions. He went back and fixed this information. Then, he was able to answer the questions that he had missed the first time around. The flat structure for J was more messy and insisted that his was easier to read. But did acknowledge that if you like numbers more then it was a better structure.

Spots that glow:

  • Both had a case-centric view of data. They kept the attributes for a case together - even in J's narrative style.
  • BF's fear that we are now going to get flat tabular structures is unfounded at least for this small sample
  • Protocol was easier to administer this time - had reasonable success with "talk out aloud". So maybe the warmup problems help? Not sure...
  • In any case, the task is definitely clear now.




Thursday, January 20, 2011

Parents and Polish: Reflections on the DRK-12 PI Meeting


The two memorable revelations for me of the 2010 DR K-12 PI meeting were the exceptionally polished software products on display and the lack of discussion about parental responsibility for learning.

Speaking as a software developer, the quality of the software products was an awakening in that projects are no longer content to create materials that are not commercial production quality. This is a really heartening development because, after all, if our goal is to make the most difference to the most number of students ,then we must look beyond our conventional avenues of dissemination and distribution. As an aside—“publisher” should not be a dreaded word—why should the avenue of dissemination be treated like a condition to be tolerated rather than a partner to be embraced? The presence of the more than a couple of "research products" in the App Store is a good start. Perhaps CADRE can do well to establish a SIG that focuses on product development. Often, when we are presented with the product of a research project, as an emerging researcher you wonder what went into this product. Similarly, when a research project creates a software product, for other projects it is beneficial to be able to discuss issues of how to recruit programmers, graphic designers, and project managers on an NSF budget where you cannot match an industry salary. It was very apparent, at least in a couple of the projects, that the software was not the output of a technically savvy grad student. So, I'd love to see not only more such products emerge but also see a community develop that provides support to projects creating software products.

To move onto my second revelation, I want to begin by saying that at education conferences, I avoid sessions that focus on equity. Equity for me is a topic that should deal with teachers working in heterogeneous classrooms where the heterogeneity is a function of differences in learning styles of students rather than of differences in home environments and parental engagement. It is really strange to me that in this climate of strong calls for teacher accountability there is not an equivalent call for parental accountability. It is not a 10th grade math teacher's job to deal with the fact that a student in her class is failing in math because they cannot read the content—that may seem harsh but I am sorry, that's just not her job. Cathy Black, chancellor of schools in New York City, when touring schools that work said, "Where there’s a strong and effective principal, where parents are committed, you have great schools.”

Why is it OK for a parent to send a child to school with little or no preparation in terms of motivation to learn but not OK for that child's teacher to send that child back home with little or no learning? What’s true of Shanghai's schools, which just outshone American schools in international testing, is also true of the schools that turn out the best (top 2%) talent in Asian countries: families will go to great lengths to ensure that the children have everything they need for doing well in school, and students will put their lives on hold to do well in school. There are no discipline issues; students are in school for exactly one thing and that is to learn. This system of schooling does stifle creativity, and it does ignore those students who cannot keep up, but on the whole, the system raises the level of respect that schools command amongst students. In story after story from teachers, I have noticed that there is a direct correlation between parental involvement and student success.

Sure, it is the job of a school to teach, but then it is the job of parents to provide a safe and encouraging family that prepares the student to learn. Somewhere in all this rhetoric about success for all and differentiated instructions and constructivist learning, we are unfortunately in a state where we are almost afraid to ask parents to hold up their side of the social contract.

Wednesday, January 5, 2011

"s" not "z"...

Visual Scalability, SOLO, Children's Data Organisation


Yesterday, I talked to CK and J about our upcoming research work on Large N Data. The surprise find for me was a couple of papers from a set of folks originating in Australia about Children's Data Organisation. Note the "s" in the word Organisation - I have looked literally high and low for work in this area as support for our work on Students' Understanding of Data Organization. But I was apparently not speaking the same language - literally. In any case, once I found one paper - I found three others from the tun of the century all talking about children's statistical thinking - one epousing a framework to benchmark learning based on the Bigg's SOLO(Structure of Observed Learning Outcomes) framework, another talking about children's representation of data and the third describing a research protocol where students are asked to create a graphical representation for single variables. Briefly, scanned all three - we'd have to commit to the SOLO framework for this to be useful - CK disagrees with some of the premise of the framework.

Moving on, read the long-put aside  paper on Visual Scalability as a way of laying the groundwork for the work on Large N research. Written by Eick and Karr in 2000, it creates a structure to use as a basis for designing visualization tools to understand large data sets. They survey the software (data structures & algorithms), the hardware (screen resolutions, disk space, network speeds, CPU limits),  and the human factors(perception) that influence tool design then prevalent. They predict the trend on each of these and their increased or diminished influence on tool design int he future. Much of what they predict has already come to pass and systems already exist that incorporate most of their suggestions. However, it is a useful taxonomy to anchor our visualization innovations and a distinct point of view from the currently popular ones rooted in Bertin and Stolte (Tableau et al), addressing the issue of size of the data set.

To round off this round of reading, here is an inspirational and validating note written as a wrap-up for a gathering of folks talking about Massive Data Sets. This was the Massive Data Set Workshop organized by Commission on Applied and Theoretical Statistics in 1996!. Peter Huber makes the case for the "case".

Thats all for now - some interesting ideas on how to "see" large data sets as collections of smaller ones but thats for a whole other post.









Web App, Graphs and Logs

(This should probably be dated yesterday so consider the blog post for Jan 4th 2011)
Today was the first FC session after the holidays - the Three Musketeers all turned up. They played Mini Golf in the web app prototype - it held up beautifully - one hour - no crashes, or serious complaints. It felt very optimistic that this would work!

Some spots that glow:

  •  This was the most unprompted use of the calculator that we have seen upto now - and we have the logs to prove it :-)! One kid insisted on using his handheld - said it was much faster!
  • I am going to go ahead and claim some transfer of habits of mind. Today all of them asked for a graph before they asked for a table. In the past they have always had to be prompted to look at graphs. 
  • They all played for the entire hour in the prototype in the limited screen real estate they had with non resizeable components, with pretty limited functionality, but no real obstacles. So successful first outing in the field.
  • I think there is some benefit to this being so similar to Shuffleboard – so to satisfy YK – we have a way in which playing one will make it easier to play the other. This should make it easier for us to create a trajectory of games?

Monday, January 3, 2011

Dichotomous Views of Data

For some time now I have been toying with this idea in my head about an aspect of our thinking on data organization. Recently, there is one issue that recurs frequently - at the heart of which is the distinction between the unit of analysis and unit of observation. Here is my attempt to organize my thoughts and see if they fit in anywhere in our article.

Dichotomy of Data Organization

Expertise in dealing with data is scattered across many different professions - the statistician knows how to synthesize and summarize data, the computer scientist has known for many decades now how to structure and store data for faster search and retrieval, the graphic designer is learning how to communicate insights from this data, journalists are increasingly making data representations the thrust of how they present evidence. All these point to the emergence of a new profession or discipline - that of data science. For the purpose of this section, we stay away from the communication of data and concentrate on the analysis and organization.

As we struggle to identify the process by which students organize the data that they encounter in unstructured forms, we come across the same dichotomy characterized in many different ways:
* Analysis v/s Observation
* Record v/s Represent
* Collect v/s Store

Different concerns justify different organizations. We observe that students routinely create many different data organizations when asked to record data from a traffic protocol.
Snapshot of Road at 8am
Snapshot of Road at 4pm
 Specifically:
  • About half organized data in fully normalized "flat" tables with repeated values for attributes - this is the structure that we had in our mind as the only possible way to comprehensively capture all data in such a way that would allow them later to answer questions about relations among attributes. This turned out not to be the case.
  • A majority used an organizational method that kept the information about individual vehicles together in such a way that it is possible to determine, e.g., the correlation between distance and speed, employing an organizational method consistent with a hierarchical data model, in that they partitioned information spatially to reflect different case levels (date/time, lane direction, vehicle information). 
This motivates a new way to think about how users can record hierarchical data. Note that this is distinct from representing or analyzing hierarchical data. 


Motivation for New Ways to Collect Data
The question that motivates the data collection is often too vague to proactively create a fully normalized flat structure of the kind required by traditional computer programs. Instead, the nature of the situation about which data is being gathered may imply an organizational structure that is very different from the one needed for analysis to answer the motivating question.

We draw inspiration from the student representations to create prototypes for two new data structures:
Nested Tables and Partitioned Plot Spaces. Below are the two prototype data structures populated by the data from the above traffic snapshots.

Nested Table 
Partitioned Plot Space
Compare them to the more traditional "fully flat" table structure required by traditional computer programs.

Our next next task is to create seamless transitions between the two views - that are illuminating while being obvious. Data "munging" - or fitting it to the needs of the analysis is a task that while arduous offers many learning opportunities for delving into the structure of the data. We have to be careful to not lose this opportunity while creating the transitions for the student.

to be continued...