Fathom That!

Monday, June 6, 2011

It doesn't matter!

So the last research meeting with CK, JK and BF was very a compelling discussion - about what is hierarchical structures and are students really exhibiting a sense of hierarchical data or are they merely coding efficiently. Well for me the one line summary of this discussion is "It doesn't matter!"

I have felt since the very first few representations began trickling in from our pilot subjects - it doesn't matter whether they are row by column or not. It is the computer that makes the row by column fully normalized flat table be the "best" data organization for capturing all the information in the most flexible way. NH in one of earlier SERG meetings mentioned that he is really tired of the tyranny of the rectilinear data structures!

In a recent administration to a large group of students, of all the representations we received, only about 40% were flat tables. Yet with the exception of 1 student, everyone captured all the information they needed to construct a fully-normalized structure. The sheer variety of structures that were on display by the students were mind-boggling. After having thought about all the ways in which we can structure the data and what makes for an efficient organization, we continue to be surprised at how many different "exhaustive" organizations the students can create.

We went into this work with a hypothesis that students would have trouble creating structures for hierarchical data and the only really "correct" - as in exhaustive in terms of capturing the data and case-centric in terms of maintaining the correlation amongst various attributes of a single object - structures would be fully normalized, flat row by column tables. We have instead seen so many different structures that are both exhaustive and case-centric that as software designers, the path to me is very clear.

Our software needs to provide a way for users to enter and organize data in many different ways and make transparent to the user the isomorphism between the three-way nested table and the partitioned
plot space and the fully flat row by column table for example.

The three representations above all record all the information in the situation and they maintain correlation among the values of the different attributes for a single object (the case). Any of these representations allow us to answer multi-variable correlation questions like - Are westbound trucks faster than eastbound cars? In that sense, they are equivalent representations and the user should be able to record data in one of these representations but be able to use the data in any of these forms or other forms that they may create.

Now to make that happen in software...

Wednesday, March 23, 2011

The Eagle has landed! 4 times now...

PHEW! Lunar Lander landed in DP's class one Thursday. Then on a Wed a week or so later, the eagle was on a marathon run - 3 classes in as many hours! Here is what their activity looked like at a glance.

There is so much more to say because you know for the first time, we have nearly a hundred thousand data points that are being cleaned and "dashboarded" by BF - the data magician. Watch this space for more...

Tuesday, March 1, 2011

Tool-Symbol, Attention-Perception

Finished two more chapters of Vygotsky's Mind in Society. It is not dense - in fact the introduction was far more dense than these actual chapters. the first two chapters are all about how the use and development of language affects the development of tool and symbol use in children as well as the development of attention and perception.

In thinking about why this should be at all relevant to the work that we are doing with data - here is what I come up with:

What language are the students using to organize their data?
If we can discern the "words" in their language will it help us to build tools that can aid their "expression"?
To tie back tightly to what Vygotsky says - how does the development of a "data sense" proceed in tandem with the development of a "data language"?

If we are to be a data processing environment akin to a word processor then it does seem important to know how the language works so that we can provide all the necessary tools to craft a well formatted and thought-out narrative in data.

The last thought to keep in mind as I read through more of the book is I guess where and how the does the "social" aspect fit into our conceptions of what we hope to build and learn? Is it simply that the collaborative data analysis space will contribute to learning or something more deep that arises from being in a classroom rather than by your self?

all for now...

References:
Vygotsky, L. S. (1978). Mind in society: the development of higher psychological processes. Cambridge, Mass: Harvard University Press.

Friday, January 28, 2011

Two more data structures...

At long last, we are done "piloting" our materials for the interviews. I conducted two interviews yesterday at GHS - real deal. A sad commentary that pulling kids out of class for 30 mins means I get many more volunteers then an offer of lunch! (DP - the teachers says - so long as their brains are engaged in some hard thinking, he's fine with me pulling people out of class). In any case, two kind souls R and J agreed to come.

R was quick - and organized all the information in a sectioned fashion. Kind of hierarchical but not strictly organized in the nested table fashion that we are thinking of. He created a partitioning of the space by categorical variables that were "above" the vehicle in some sense and then proceeded to code the vehicle type and speed together for all vehicles. He then did a second pass through both segments and added the information about the distance from the preceding vehicle. He had some way to answer all questions but did not go so far as to set up a two way correlation for the last two questions. The flat data structure for him appeared to be more organized and "detailed" - not that he had missing values but the information was "clearer" he said.

J favoured words over numbers. "I like to describe in words rather in numbers" - his own words. The recording took longer mainly because he was narrating the information and needed more time to write it all out. He initially made the mistake of not recording the vehicle type and recorded all the vehicles as cars. Since the rest of his information was complete, I hazard that this had more to do with our colloquial tendency to refer to any vehicle as a "car", than it had to do with considering an attribute unimportant. This meant that he was unable to answer two of the questions. He went back and fixed this information. Then, he was able to answer the questions that he had missed the first time around. The flat structure for J was more messy and insisted that his was easier to read. But did acknowledge that if you like numbers more then it was a better structure.

Spots that glow:

Both had a case-centric view of data. They kept the attributes for a case together - even in J's narrative style.
BF's fear that we are now going to get flat tabular structures is unfounded at least for this small sample
Protocol was easier to administer this time - had reasonable success with "talk out aloud". So maybe the warmup problems help? Not sure...
In any case, the task is definitely clear now.

Thursday, January 20, 2011

Parents and Polish: Reflections on the DRK-12 PI Meeting

The two memorable revelations for me of the 2010 DR K-12 PI meeting were the exceptionally polished software products on display and the lack of discussion about parental responsibility for learning.

Speaking as a software developer, the quality of the software products was an awakening in that projects are no longer content to create materials that are not commercial production quality. This is a really heartening development because, after all, if our goal is to make the most difference to the most number of students ,then we must look beyond our conventional avenues of dissemination and distribution. As an aside—“publisher” should not be a dreaded word—why should the avenue of dissemination be treated like a condition to be tolerated rather than a partner to be embraced? The presence of the more than a couple of "research products" in the App Store is a good start. Perhaps CADRE can do well to establish a SIG that focuses on product development. Often, when we are presented with the product of a research project, as an emerging researcher you wonder what went into this product. Similarly, when a research project creates a software product, for other projects it is beneficial to be able to discuss issues of how to recruit programmers, graphic designers, and project managers on an NSF budget where you cannot match an industry salary. It was very apparent, at least in a couple of the projects, that the software was not the output of a technically savvy grad student. So, I'd love to see not only more such products emerge but also see a community develop that provides support to projects creating software products.

To move onto my second revelation, I want to begin by saying that at education conferences, I avoid sessions that focus on equity. Equity for me is a topic that should deal with teachers working in heterogeneous classrooms where the heterogeneity is a function of differences in learning styles of students rather than of differences in home environments and parental engagement. It is really strange to me that in this climate of strong calls for teacher accountability there is not an equivalent call for parental accountability. It is not a 10th grade math teacher's job to deal with the fact that a student in her class is failing in math because they cannot read the content—that may seem harsh but I am sorry, that's just not her job. Cathy Black, chancellor of schools in New York City, when touring schools that work said, "Where there’s a strong and effective principal, where parents are committed, you have great schools.”

Why is it OK for a parent to send a child to school with little or no preparation in terms of motivation to learn but not OK for that child's teacher to send that child back home with little or no learning? What’s true of Shanghai's schools, which just outshone American schools in international testing, is also true of the schools that turn out the best (top 2%) talent in Asian countries: families will go to great lengths to ensure that the children have everything they need for doing well in school, and students will put their lives on hold to do well in school. There are no discipline issues; students are in school for exactly one thing and that is to learn. This system of schooling does stifle creativity, and it does ignore those students who cannot keep up, but on the whole, the system raises the level of respect that schools command amongst students. In story after story from teachers, I have noticed that there is a direct correlation between parental involvement and student success.

Sure, it is the job of a school to teach, but then it is the job of parents to provide a safe and encouraging family that prepares the student to learn. Somewhere in all this rhetoric about success for all and differentiated instructions and constructivist learning, we are unfortunately in a state where we are almost afraid to ask parents to hold up their side of the social contract.

Wednesday, January 5, 2011

"s" not "z"...

Visual Scalability, SOLO, Children's Data Organisation

Yesterday, I talked to CK and J about our upcoming research work on Large N Data. The surprise find for me was a couple of papers from a set of folks originating in Australia about Children's Data Organisation. Note the "s" in the word Organisation - I have looked literally high and low for work in this area as support for our work on Students' Understanding of Data Organization. But I was apparently not speaking the same language - literally. In any case, once I found one paper - I found three others from the tun of the century all talking about children's statistical thinking - one epousing a framework to benchmark learning based on the Bigg's SOLO(Structure of Observed Learning Outcomes) framework, another talking about children's representation of data and the third describing a research protocol where students are asked to create a graphical representation for single variables. Briefly, scanned all three - we'd have to commit to the SOLO framework for this to be useful - CK disagrees with some of the premise of the framework.

Moving on, read the long-put aside paper on Visual Scalability as a way of laying the groundwork for the work on Large N research. Written by Eick and Karr in 2000, it creates a structure to use as a basis for designing visualization tools to understand large data sets. They survey the software (data structures & algorithms), the hardware (screen resolutions, disk space, network speeds, CPU limits), and the human factors(perception) that influence tool design then prevalent. They predict the trend on each of these and their increased or diminished influence on tool design int he future. Much of what they predict has already come to pass and systems already exist that incorporate most of their suggestions. However, it is a useful taxonomy to anchor our visualization innovations and a distinct point of view from the currently popular ones rooted in Bertin and Stolte (Tableau et al), addressing the issue of size of the data set.

To round off this round of reading, here is an inspirational and validating note written as a wrap-up for a gathering of folks talking about Massive Data Sets. This was the Massive Data Set Workshop organized by Commission on Applied and Theoretical Statistics in 1996!. Peter Huber makes the case for the "case".

Thats all for now - some interesting ideas on how to "see" large data sets as collections of smaller ones but thats for a whole other post.

Web App, Graphs and Logs

(This should probably be dated yesterday so consider the blog post for Jan 4th 2011)
Today was the first FC session after the holidays - the Three Musketeers all turned up. They played Mini Golf in the web app prototype - it held up beautifully - one hour - no crashes, or serious complaints. It felt very optimistic that this would work!

Some spots that glow:

This was the most unprompted use of the calculator that we have seen upto now - and we have the logs to prove it :-)! One kid insisted on using his handheld - said it was much faster!
I am going to go ahead and claim some transfer of habits of mind. Today all of them asked for a graph before they asked for a table. In the past they have always had to be prompted to look at graphs.
They all played for the entire hour in the prototype in the limited screen real estate they had with non resizeable components, with pretty limited functionality, but no real obstacles. So successful first outing in the field.
I think there is some benefit to this being so similar to Shuffleboard – so to satisfy YK – we have a way in which playing one will make it easier to play the other. This should make it easier for us to create a trajectory of games?