Store Raw Data or Reference?

Should store raw data or reference ?

A tradeoff explore of should storing all data or simplify it?

Background

Recently, I’m implementing a feature of querying and storing the tracking information of shipments. I noticed the tracking information raw data actually has a lot of redundancy. My question is that Is it worth storing all raw tracking information or using reference instead ?

Test environment

Node.js v9.2.1
MongoDB v3.6.3

new operator turns function call into a constructor call which returns a new object;

JSON Data Analysis

Let’s get a close look at those tracking information data.

...
"shipmentEvents": [{
      "dateTime": "2018-04-07T02:32:58.000Z",
      "serviceEvent": {
        "eventCode": "PU",
        "description": "Shipment pick up"
      },
      "signatory": "",
      "serviceArea": {
        "serviceAreaCode": "XXX",
        "description": "XXXX-XXX"
      }
    }, {
      "dateTime": "2018-04-07T00:46:00.000Z",
      "serviceEvent": {
        "eventCode": "RR",
        "description": "Response received"
      },
      "signatory": "",
      "serviceArea": {
        "serviceAreaCode": "LAX",
        "description": "LOS ANGELES GATEWAY,CA-USA"
      }
    }, {
      "dateTime": "2018-04-07T05:56:06.000Z",
      "serviceEvent": {
        "eventCode": "AF",
        "description": "Arrived facility"
      },
      "signatory": "",
      "serviceArea": {
        "serviceAreaCode": "YYY",
        "description": "YYYY-YYY"
      }
    }, {
      "dateTime": "2018-04-07T06:09:09.000Z",
      "serviceEvent": {
        "eventCode": "PL",
        "description": "Processed at location"
      },
      "signatory": "",
      "serviceArea": {
        "serviceAreaCode": "XXX",
        "description": "XXXX-XXX"
      }
    },
    ...]

The serviceEvent data pattern is repeating within 62 kinds of eventCode according to the shipment vendor. serviceArea however, depends on the shipment path. If the platform serves the whole world, there could be many possible data patterns. If on serves a limited number of countries, the data pattern would also limited. So, should we

① use three models Shipment, ServiceEvent and ServiceArea, which store the references of particular ServiceEvent and ServiceArea in a shipment.

② Directly store the raw data, do not consider repeating ServiceEvent and ServiceArea in a shipment.

Which one works better under different situations?

TO BE Continued

« this keyword in JavaScript return model data ignored field »

Ian Ma