Lecture 1.2: Data
-
Data Sources
-
Simulations
ex: CFD, environmental modeling, virtual crash tests
-
Sensors/Scanners
ex: medical diagnosis, satellites, emissions monitors
-
Surveys/Records
ex: census, consumer tracking, polls, observational studies
-
Equations
ex: math, health effects models
-
Data Characteristics
-
Continuity
-
Continuous: nature is continuous (for most purposes), but only implicit reps
-
Discrete: anything sampled or stored on digital media
representation error
possible aliasing
artifacts of sampling
-
Structure
-
Definitions
-
Topology: connectivity (triangle)
-
Geometry: realization of topology (coordinates)
-
Elements
-
Points: located where data value known (geom)
-
Cells: set up interpolation parameters (topology)
common types: point, line, triangle, quad, tetra, voxel
-
Structured: inherent spation relationship among points
relatively efficient storage: topology is implicit
-
regular
can be represented implicitly (3x3: dimension, origin, aspect)
ex: medical data
-
rectilinear
can be represented semi implicitly (nx + ny + nz)
ex: CFD -- refinement around objects
-
curvilinear
geometry represented explicitly (3*nx*ny*nz)
ex: CFD -- flow along river
ease of computation
wide array of visualization algorithms
-
Unstructured: no (or unknown) spatial relationship among points
ex: FEM, structural analysis, census, monitor devices
flexibility
often reality
more limited array of visualization algorithms
-
Dimension: # of independent variables (2D, 3D, etc)
usually means number of spatial/temporal dimensions
-
Multiple
-
scalar: single value per position
multivariate: multiple values per position
multiple scalars
vector
tensor
-
Type
-
Scale
-
Nominal: just names or categories or identifiers
can say "this one is different from that one"
ex: county, land use, ethnicity or race, tissue type
-
Ordinal: values are ordered
can say "this one is bigger than that one"
ex: preference, ranking
-
Interval: constant step size
can say "the difference between these two is the same as the difference between those two"
ex: test scores, degrees Fahrenheit
-
Ratio: meaningful zero
can say "this one is twice as big as that one"
ex: degrees Kelvin, income, percent below poverty line, wind speed
-
Data Representation
-
Compact: efficient memory use
structured schemes, unstructured schemes, sparse matrices, shared verts
-
Efficient: computationally accessible; retrieve and store in constant time
structured schemes
-
Mappable: straight-forward conversions
native -> rep: simple conversion, no lost info
rep -> graphics prim: esp for interactive display
-
Minimal coverage: manageble # options
few variants which work for a wide range of data sets
-
Simple
easier to use
easier to optimize
errors less likely
-
Data Transformations
-
Interpolation
-
Aggregation
-
Smoothing
-
Simplification
-
Data Quality
-
Missing data
-
Uncertain data
-
Representation error
-
Sampling artifacts