COVID-19 Data Analysis Using Python
Exploratory Data Analysis Project
Project Overview: In this project, I leveraged Python programming skills to analyze the global impact of the COVID-19 pandemic using data from Johns Hopkins University. The primary objective was to explore the relationship between the spread of the virus in various countries and the happiness indices reported by the citizens of those countries. This involved preprocessing and merging two distinct datasets: the COVID-19 confirmed cases dataset and the World Happiness Report.
Technical Description
- Data Import and Preparation: The project makes extensive use of Python libraries, including Pandas, NumPy, Seaborn, and Matplotlib for data manipulation and visualization. The COVID-19 confirmed cases dataset was imported using Pandas. Initial data inspection involved displaying the first 10 rows to understand its structure and checking the dataset's dimensions. Unnecessary columns (Lat, Long) were removed to streamline the dataset for analysis.
- Data Aggregation: Data was aggregated by country to facilitate comparison and analysis at a national level. Missing values were handled appropriately to ensure the dataset's integrity.
- Data Analysis: Calculated measures such as the maximum infection rates for each country to quantify the spread of the virus. The World Happiness Report data was integrated with the COVID-19 dataset to analyze the correlation between COVID-19 infection rates and various life factors like GDP, social support, and personal freedoms.
- Data Visualization: Seaborn and Matplotlib were utilized to create detailed graphs and visualizations. Various types of plots (line plots, bar charts, heatmaps) were used to present data insights effectively, illustrating the correlations between the virus's spread and national happiness scores, enhancing the interpretability of complex statistical results.
Outcome
The project successfully highlighted potential relationships between public health data and psychological well-being, reinforcing data analysis and visualization skills. This analysis is particularly relevant for stakeholders interested in understanding the broader social impacts of the COVID-19 pandemic.
Conclusion
This project demonstrates essential data analysis and visualization techniques using Python, showcasing the ability to derive actionable insights from complex datasets. It emphasizes the importance of data preparation, integration, and visualization in deriving meaningful conclusions that can inform public health and policy decisions.
GitHub Project
You can view the full project on GitHub here.
Back to Portfolio