Dichotomy of Data Organization
Expertise in dealing with data is scattered across many different professions - the statistician knows how to synthesize and summarize data, the computer scientist has known for many decades now how to structure and store data for faster search and retrieval, the graphic designer is learning how to communicate insights from this data, journalists are increasingly making data representations the thrust of how they present evidence. All these point to the emergence of a new profession or discipline - that of data science. For the purpose of this section, we stay away from the communication of data and concentrate on the analysis and organization.As we struggle to identify the process by which students organize the data that they encounter in unstructured forms, we come across the same dichotomy characterized in many different ways:
* Analysis v/s Observation
* Record v/s Represent
* Collect v/s Store
Different concerns justify different organizations. We observe that students routinely create many different data organizations when asked to record data from a traffic protocol.
![]() |
Snapshot of Road at 8am |
![]() |
Snapshot of Road at 4pm |
- About half organized data in fully normalized "flat" tables with repeated values for attributes - this is the structure that we had in our mind as the only possible way to comprehensively capture all data in such a way that would allow them later to answer questions about relations among attributes. This turned out not to be the case.
- A majority used an organizational method that kept the information about individual vehicles together in such a way that it is possible to determine, e.g., the correlation between distance and speed, employing an organizational method consistent with a hierarchical data model, in that they partitioned information spatially to reflect different case levels (date/time, lane direction, vehicle information).
Motivation for New Ways to Collect Data
The question that motivates the data collection is often too vague to proactively create a fully normalized flat structure of the kind required by traditional computer programs. Instead, the nature of the situation about which data is being gathered may imply an organizational structure that is very different from the one needed for analysis to answer the motivating question.
We draw inspiration from the student representations to create prototypes for two new data structures:
Nested Tables and Partitioned Plot Spaces. Below are the two prototype data structures populated by the data from the above traffic snapshots.
![]() |
Nested Table |
![]() |
Partitioned Plot Space |
Our next next task is to create seamless transitions between the two views - that are illuminating while being obvious. Data "munging" - or fitting it to the needs of the analysis is a task that while arduous offers many learning opportunities for delving into the structure of the data. We have to be careful to not lose this opportunity while creating the transitions for the student.
to be continued...