ETA Creation
To approach the Estimated Time of Arrival (ETA) prediction task based on the Dataset provided, let’s break it and incorporate the key aspects of the problem into your workflow.
Key Concepts in the Statement
Objective:
Input Query:
Output:
Build a model FF that maps qq to YY, predicting arrival times for unfinished packages.
Dataset Usage:
LaDe-D (Delivery dataset) is the primary dataset.
LaDe-P (Pickup dataset) can also be used if additional features are required.
Chronological splitting of data into training, validation, and test sets in a 6:2:2 ratio ensures no data leakage and maintains the temporal order of events.
Steps to Build ETA Prediction
1. Define the Target Variable
For each package in the LaDe-D dataset:
Calculate the ETA target variable

This is the actual time it took to complete the task.
Similarly, if using LaDe-P, yiy_i would be:

2. Preprocessing
Merge Datasets:
Combine LaDe-D and LaDe-P if both are used, ensuring proper alignment on
package_id
andcourier_id
.
Clean Missing Data:
Handle missing or null values in columns like
pickup_time
,delivery_time
, orcoordinates
.Impute or drop rows based on the level of data completeness.
Normalize Timestamps:
Convert all time-related fields to a consistent format (e.g.,
datetime
).Handle missing years in timestamps using the corresponding
ds
field as explained earlier.
Generate Additional Features:
Extract temporal features (e.g., hour, day of the week, month) from timestamps like
accept_time
anddelivery_time
.Extract geographical features such as distance, route complexity, and traffic impact using
Trajectory
andRoad Network
datasets.
By following these structured steps, you should be able to build an accurate and interpretable ETA prediction model.
Last updated