Approximate Distances and Outlier Detection
1. Calculate Approximate Distances
pip install geopyimport math def haversine(lon1, lat1, lon2, lat2): # Convert latitude and longitude from degrees to radians lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2]) # Haversine formula dlon = lon2 - lon1 dlat = lat2 - lat1 a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2 c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a)) r = 6371 # Radius of the Earth in kilometers return r * c # Example: Compute distance between two points print(haversine(77.5946, 12.9716, 77.6784, 13.0827)) # Example coordinatesfrom geopy.distance import geodesic coord1 = (12.9716, 77.5946) # (latitude, longitude) coord2 = (13.0827, 77.6784) print(geodesic(coord1, coord2).km)import pandas as pd # Sample DataFrame df = pd.DataFrame({ 'accept_lng': [77.5946, 77.6012], 'accept_lat': [12.9716, 12.9750], 'pickup_lng': [77.6784, 77.6400], 'pickup_lat': [13.0827, 13.0000] }) # Apply haversine function row-wise df['distance_km'] = df.apply( lambda row: haversine(row['accept_lng'], row['accept_lat'], row['pickup_lng'], row['pickup_lat']), axis=1 ) print(df)
2. Identify and Filter Out Outliers
Final Steps:
1. Calculating Approximate Distances in Power BI and Excel
Power BI:
Excel:
2. Identifying and Filtering Out Outliers
Power BI:
Excel:
Last updated