ETA Creation

To approach the Estimated Time of Arrival (ETA) prediction task based on the Dataset provided, let’s break it and incorporate the key aspects of the problem into your workflow.


Key Concepts in the Statement

  1. Objective:

  2. Input Query:

  3. Output:

    • Build a model FF that maps qq to YY, predicting arrival times for unfinished packages.

  4. Dataset Usage:

    • LaDe-D (Delivery dataset) is the primary dataset.

    • LaDe-P (Pickup dataset) can also be used if additional features are required.

    • Chronological splitting of data into training, validation, and test sets in a 6:2:2 ratio ensures no data leakage and maintains the temporal order of events.


Steps to Build ETA Prediction

1. Define the Target Variable

  • For each package in the LaDe-D dataset:

    • Calculate the ETA target variable

  • This is the actual time it took to complete the task.

  • Similarly, if using LaDe-P, yiy_i would be:

2. Preprocessing

  • Merge Datasets:

    • Combine LaDe-D and LaDe-P if both are used, ensuring proper alignment on package_id and courier_id.

  • Clean Missing Data:

    • Handle missing or null values in columns like pickup_time, delivery_time, or coordinates.

    • Impute or drop rows based on the level of data completeness.

  • Normalize Timestamps:

    • Convert all time-related fields to a consistent format (e.g., datetime).

    • Handle missing years in timestamps using the corresponding ds field as explained earlier.

  • Generate Additional Features:

    • Extract temporal features (e.g., hour, day of the week, month) from timestamps like accept_time and delivery_time.

    • Extract geographical features such as distance, route complexity, and traffic impact using Trajectory and Road Network datasets.

By following these structured steps, you should be able to build an accurate and interpretable ETA prediction model.

Last updated