Data Cleaning Using SQL

Project Overview: In this project, I performed comprehensive data cleaning and transformation on the Nashville Housing dataset using SQL. The goal was to enhance the dataset's usability for analysis by addressing various data quality issues.

Skills and Concepts

  • Standardizing Date Formats: Converted inconsistent date formats to a standardized date format using the CONVERT function. Added a new column SaleDateConverted and populated it with converted date values.
  • Populating Missing Property Address Data: Identified and populated missing property addresses by joining the dataset on ParcelID and using the ISNULL function to fill in gaps.
  • Breaking Out Address into Individual Columns: Split the PropertyAddress field into separate columns for address, city, and state using string functions like SUBSTRING and CHARINDEX. Added and updated columns PropertySplitAddress, PropertySplitCity, and PropertySplitState.
  • Handling Owner Address Data: Applied similar transformations to the OwnerAddress field, breaking it into OwnerSplitAddress, OwnerSplitCity, and OwnerSplitState.
  • Updating Boolean Fields: Standardized boolean fields, converting 'Y' and 'N' values to 'Yes' and 'No' respectively, using the CASE statement.
  • Removing Duplicates: Implemented a Common Table Expression (CTE) to identify and remove duplicate records based on key fields like ParcelID, PropertyAddress, SalePrice, SaleDate, and LegalReference.
  • Deleting Unused Columns: Dropped unnecessary columns such as OwnerAddress, TaxDistrict, PropertyAddress, and SaleDate to streamline the dataset.

Conclusion

This project demonstrates essential data cleaning techniques and SQL proficiency, valuable for data analysis and business intelligence. Key tasks included standardizing date formats, populating missing property addresses, splitting address fields into individual columns, handling owner address data, standardizing boolean fields, removing duplicates, and deleting unused columns.

GitHub Project

You can view the full project on GitHub here.

Back to Portfolio