Should store raw data or reference ?
A tradeoff explore of should storing all data or simplify it?
Background
Recently, I’m implementing a feature of querying and storing the tracking information of shipments. I noticed the tracking information raw data actually has a lot of redundancy. My question is that Is it worth storing all raw tracking information or using reference instead ?
Test environment
new
operator turns function call into aconstructor
call which returns a new object;
JSON Data Analysis
Let’s get a close look at those tracking information data.
The serviceEvent
data pattern is repeating within 62 kinds of eventCode
according to the shipment vendor. serviceArea
however, depends on the shipment path. If the platform serves the whole world, there could be many possible data patterns. If on serves a limited number of countries, the data pattern would also limited. So, should we
① use three models Shipment
, ServiceEvent
and ServiceArea
, which store the references of particular ServiceEvent
and ServiceArea
in a shipment.
② Directly store the raw data, do not consider repeating ServiceEvent
and ServiceArea
in a shipment.
Which one works better under different situations?