Visual Scalability, SOLO, Children's Data Organisation
Yesterday, I talked to CK and J about our upcoming research work on Large N Data. The surprise find for me was a couple of papers from a set of folks originating in Australia about Children's Data Organisation. Note the "s" in the word Organisation - I have looked literally high and low for work in this area as support for our work on Students' Understanding of Data Organization. But I was apparently not speaking the same language - literally. In any case, once I found one paper - I found three others from the tun of the century all talking about children's statistical thinking - one epousing a framework to benchmark learning based on the Bigg's SOLO(Structure of Observed Learning Outcomes) framework, another talking about children's representation of data and the third describing a research protocol where students are asked to create a graphical representation for single variables. Briefly, scanned all three - we'd have to commit to the SOLO framework for this to be useful - CK disagrees with some of the premise of the framework.
Moving on, read the long-put aside paper on Visual Scalability as a way of laying the groundwork for the work on Large N research. Written by Eick and Karr in 2000, it creates a structure to use as a basis for designing visualization tools to understand large data sets. They survey the software (data structures & algorithms), the hardware (screen resolutions, disk space, network speeds, CPU limits), and the human factors(perception) that influence tool design then prevalent. They predict the trend on each of these and their increased or diminished influence on tool design int he future. Much of what they predict has already come to pass and systems already exist that incorporate most of their suggestions. However, it is a useful taxonomy to anchor our visualization innovations and a distinct point of view from the currently popular ones rooted in Bertin and Stolte (Tableau et al), addressing the issue of size of the data set.
To round off this round of reading, here is an inspirational and validating note written as a wrap-up for a gathering of folks talking about Massive Data Sets. This was the Massive Data Set Workshop organized by Commission on Applied and Theoretical Statistics in 1996!. Peter Huber makes the case for the "case".
Thats all for now - some interesting ideas on how to "see" large data sets as collections of smaller ones but thats for a whole other post.