SciTech-Mathmatics-Probability+Statistics-Descriptive Statistics I + II(using Python) and Data Visualization
Learn Stats for Python: Descriptive Statistics I
Learn Stats for Python: Descriptive Statistics II + Data Visualization
BY IVÁN PALOMARES CARRASCOSAPOSTED ON AUGUST 28, 2024
In today's world, pervaded by data and AI-driven technologies and solutions,
mastering their foundations is a guaranteed gateway to unlocking powerful insights from data and making effective and reliable data-driven decisions.
One such family of foundational notions comes from nowhere other than statistics. Given its versatility and capabilities, as well as its popularity in data analysis and AI applications, learning stats with the aid of the Python programming language is an ideal approach to learning statistical concepts and putting them in practice: all at the same time!
This comprehensive tutorial series, consisting of five parts, curates and links together these "learn stats for Python" tutorials, providing you with a strong foundational learning pathway in both programming and statistics. Each tutorial is designed to be short, straight to the point, and easy to digest.
Descriptive Statistics I
Part I of the series focuses on tutorials to get started with the most essential pillar of statistics: descriptive statistics. Descriptive statistics encompass tools and techniques to summarize and describe the main characteristics of a dataset, its distribution, variability, tendency, etc.
1. Data Preparation Essentials
The first group of tutorials we curated for you focuses on the essential preliminary steps needed before conducting any statistical analysis (even the most basic ones). These steps include cleaning, normalizing, and transforming your initial data to ensure consistent and accurate analysis results thereafter. Data preparation is essential as it sets the foundation for any subsequent statistical computations. These selected tutorials illustrate how to use Python to perform some of the most frequent data preparation steps:
normalize data in Python
remove outliers in Python
perform data binning in Python
transform data in Python
2. Descriptive Statistics Fundamentals
Now it's time to plunge into pure stats.
The following group of tutorials covers the central notions of descriptive statistics, that is, summarizing and describing the main characteristics of your (previously prepared) data: \(\large \text{ mean, median, variability, skewness, percentiles, and more }\).
It is important to understand these essentials to uncover the "appearance" of your data: the first step toward interpreting and communicating insightful patterns underlying them.
These two tutorials showcase the calculation of mean, median, and mode, using two different Python libraries:
How to calculate mean, median, and mode with numpy
How to calculate mean, median, and mode in pandas
Meanwhile, other basic statistical properties are covered in these tutorials:
Calculate Sample & Population Variance in Python
Calculate the Standard Deviation of a List in Python
Calculate Skewness & Kurtosis in Python
Calculate Percentiles in Python
3. Frequencies and Distributions
After learning to calculate the most common statistics used to describe the characteristics of your data, the next natural step is to learn mechanisms to analyze the distribution of the data, bet it across categories or through intervals. Accordingly, the next few tutorials will teach you how to build frequency tables, calculate relative frequencies upon absolute frequencies, and work with contingency tables to summarize relationships between categorical variables, among others.
Create Frequency Tables in Python
Calculate Relative Frequency in Python
Create a Contingency Table in Python
Calculate Expected Value in Python
4. Correlation and Covariance Metrics
To finalize part I of this tutorial series, let's put together some tutorials aimed at exploring correlation and covariance measures.
These are key statistical metrics to analyze and explore the relationship between variables in your data. The tutorials below illustrate how to calculate and interpret several types of correlations, perform correlation tests, and build correlation and covariance matrices for uncovering hidden connections between parts of your data. These tools are very relevant and constitute part of the foundations behind predictive modeling, AI, and machine learning systems: not without reason, making predictions and inferences intelligently entails discovering the hidden relationships in our data.
Calculate Correlation in Python
Calculate Spearman Rank Correlation in Python
Perform a Correlation Test in Python
Create a Correlation Matrix in Python
Create a Covariance Matrix in Python
Coming Up Next
In the next post in this series, we will wrap up with additional and advanced descriptive statistics topics, and move on to statistical data visualization tools.
Descriptive Statistics II
Part II of the series continues introducing more important concepts from descriptive statistics, namely similarity and distance measures, and provides a brief exploration of some advanced and applied topics, predominantly under an exploratory data analysis viewpoint, such as clustering. After this, we move on to Python tutorials covering data visualization techniques.
1. Similarity and Distance Measures
Methods to quantify the similarity or dissimilarity between data points, samples, or populations, are crucial in a variety of statistical analysis methodologies and machine learning techniques: clustering, pattern recognition, classification, etc.
The following tutorials teach you how to use Python to apply metrics that indicate how close or far apart your data points are. Mastering these metrics is key to being able to compare datasets, group similar data points together in a coherent manner, or detect anomalies or data points that significantly deviate from the rest.
calculate Euclidean distance in Python
calculate Manhattan distance in Python
calculate Jaccard similarity in Python
calculate Mahalanobis distance in Python
2. Advanced and Applied Topics
Having learned at this point the descriptive statistics foundations, and before moving on to the next topic in this journey (data visualization), it’s the perfect time to have a glimpse at some more complex statistics-based techniques and practical applications. This way, you’ll gain some insight into specialized methodologies commonly used in data science and analytics. In part V of this series, we will put the lens on more advanced predictive solutions like predictions and forecasting. But for now, let’s cover some tutorials aimed at guiding you through clustering data, univariate and bivariate analysis, and multi-dimensional scaling. These are common methods for solving real-world challenges requiring some statistical rigor.
Perform K-Means Clustering in Python
Use the Elbow method in Python
Perform Multidimensional Scaling in Python
Perform Univariate Analysis in Python
Perform Bivariate Analysis in Python
Data Visualization
1. Basic Data Visualizations
Visualizing data is a valuable way of getting further insight into understanding what the data looks like, and discovering what key patterns and trends they exhibit. Choosing the right visualization or chart type heavily depends on the nature of your data and what properties of the data you want to display. The following tutorials cover some foundational plotting techniques deemed essential for visualizing data distributions and relationships clearly and effectively. They combine the use of several well-known Python libraries for data visualization, such as seaborn and matplotlib, as well as pandas for handling data structures.
Create barplots in seaborn
Create a stacked bar chart in Pandas
Create a Histogram from a Pandas Series
Create a Relative Frequency Histogram in Matplotlib
Create a Pie Chart in Seaborn
Create a Scatter Plot from a Pandas DataFrame
Create Heatmaps in Seaborn
2. Advanced Data Visualization
To go one step beyond in creating powerful and insightful data visualization, try exploring these Python tutorials that showcase the creation of more specialized and complex plots. Some of these plots are commonly used in machine learning modeling processes and model evaluation, namely for predictive solutions like classifiers and regression models. These plots can be an interesting discovery for those who already mastered the basics of data visualization.
create a Pareto chart in Python
create a Bell curve in Python
Perform a Correlation Test in Python
Plot a ROC Curve in Python
Plot a Logistic Regression curve in Python
Coming Up Next
In the next post in this series, you'll be able to learn how to deal with probabilities and probability distributions in Python, and perform a variety of data sampling techniques.
 
                    
                
 
                
            
         
         浙公网安备 33010602011771号
浙公网安备 33010602011771号