Preprocessing Using Power BI and Excel


1. Data Preprocessing in Power BI

Steps to Follow:

  1. Load Data into Power BI:

    • Open Power BI Desktop.

    • Use Get Data to import all five city datasets.

    • Append the datasets by selecting Home > Transform Data > Append Queries.

  2. Add a City Column:

    • If the city column is missing, use the Add Column > Custom Column feature in Power Query to include the city name.

  3. Remove Duplicates:

    • In Power Query, use Remove Duplicates for package_id to ensure no duplicate rows.

  4. Handle Missing Values:

    • Replace or remove null values using Transform > Replace Values or Remove Rows.

  5. Create Delivery Duration:

    • Add a custom column in Power Query to calculate delivery duration:

      Duration = [delivery_time] - [accept_time]
  6. Compute Geographical Distance:

    • If distance calculations are needed, you can use DAX formulas after loading the data into the model, or preprocess using approximate calculations for now.

  7. Filter Outliers:

    • Visualize delivery durations and distances in Power BI visuals.

    • Use filters to exclude unrealistic values.

  8. Save the Processed Data:

    • Apply all changes in Power Query, and use Export Data from visuals if a cleaned file is needed.


2. Data Preprocessing in Excel

Steps to Follow:

  1. Merge Data:

    • Copy and paste each city dataset into a master workbook.

    • Add a column to each dataset specifying the city name.

  2. Remove Duplicates:

    • Use Data > Remove Duplicates on package_id.

  3. Handle Missing Values:

    • Use filters or Conditional Formatting to identify and replace/remove null values.

  4. Calculate Delivery Duration:

    • Create a new column:

    • Format the column as time.

  5. Approximate Distance Calculation:

    • Add a formula for Euclidean distance (simplified for flat surfaces):

  6. Detect Outliers:

    • Use Conditional Formatting or a pivot table to highlight extremely high/low delivery durations or distances.

  7. Save the Processed File:

    • Save the cleaned file as a new workbook.


Next Steps

Once the preprocessing is completed in Power BI or Excel:

  1. Validate the data quality by reviewing summary statistics.

  2. Use this cleaned dataset for analysis and modeling.

Last updated