Data Analytics

This course covers the fundamentals of data analytics, including statistics, data processing, and machine learning. You will learn how to work with Python, SQL, Excel, and Power BI for data analysis and visualization.

Objective

  • Understand basic and advanced statistics for data analysis.
  • Learn data collection, web scraping, and data processing techniques.
  • Use Python and SQL for data manipulation and analysis.
  • Visualize data using Excel and Power BI.
  • Build predictive models and apply machine learning concepts.

Curriculum

Basic Statistics

  • What is Statistics?
  • Type of statistics.
  • Difference between Population and Sample
  • What is Descriptive Statistics?
  • What is the Measure of Central Tendency? (Mean, Median, Mode)
  • What is Measure of Variability? (Range, IQR, Variance, Standard Deviation)
  • What is Normal Distribution?

Advanced Statistics

  • What is Inferential Statistics?
  • What is Hypothesis Testing? (Level of Significant, Confidence Interval, Test Statistic)
  • What is ANOVA?
  • What is Regression?
  • What is Correlation?
  • Type of Correlation.

Introduction to Data Analytics

  • What is Data Analytics?
  • Data Analysis Steps?
  • Type of Data Analytics
  • Data Analytics Use Cases
  • Lifecycle of Data Analytics
  • Benefits of Data Analytics
  • Data Analytics Techniques
  • Data Analytics Tools
  • How to Become a Data Analyst?
  • Skills needed to become a Data Analyst
  • Data Analyst and Data Scientist

Python I: Fundamental

  • Installation Python
  • Tool to use (Jupyter)
  • Array
  • Matrix Operation
  • Dictionaries

Data Collection & Web Scraping

  • Introduction to data collection
  • Introduction to Web Scraping & Web Crawling
  • HTML parsing with BeautifulSoup
  • Scrapy Fundamental
  • XPath with Scrapy
  • Scrapy with Selenium
  • Scrapy with Splash

SQL for Analytics

  • Introduction to SQL
  • Database Normalization and Entity Relationship Model
  • SQL Operators
  • Working with SQL: Join, Tables, and Variables
  • Working with Subqueries​ (​Nested subqueries, Correlated subqueries, Common Table Expressions)
  • SQL Views, Stored Procedures, Triggers
  • Deep Dive into SQL Functions
  • Deep Dive into User-defined Functions

Excel for Data Analytics

  • Why using Excel?
  • Formula and Function

Excel for Data Analytics ('Data Preparation')

  • Load Data into Excel (File (txt,csv), Database)
  • What is power query?
  • Why need power query?
  • How to use power query to clean and transform data?
  • Data Validation
  • Sort and Filtering
  • Conditional Formatting
  • All Kind of Count in Excel

Excel for Data Analytics ('Data Extraction')

  • Split text
  • Auto fill
  • Transpose
  • Concatenate
  • Trim
  • Change Data Type (Convert Data)

Data Analysis with Excel

  • What if Analysis in Excel?
  • Goal Seek
  • Scenario Manager
  • Data Table
  • Solver
  • Histogram
  • Correlation
  • Rank and Percentile
  • Descriptive Statistics (Regression in Excel)
  • Forecast Sheet

Data Visualization with Excel

  • Pivot Table, Chart, Slicer
  • Create and Modify Pivot Tables
  • Use Pivot Tables to solve a variety of Business Problems
  • Create an Interactive Dashboard with Pivot Charts and Slicers
  • Bring data to life using conditional formatting & custom number formatting
  • Mastering charts in Excel
  • Complete the dashboard with creative visualizations and dynamic charts (Interaction Dashboard)

Data Warehouse (Ready Data to Analyze)

  • Data Warehouse Overview
  • Type of Data Warehouse
  • Data Warehouse architecture
  • Data Warehouse Characteristics
  • Data Warehouse vs Database
  • ETL and ELT

Power BI

  • Introduction to Power BI
  • Understanding Power BI
  • Power BI Desktop and the Power BI service
  • Creating a Power BI account

Data Preparation and Modeling With Power BI

  • Creating reports and dashboards
  • Advanced features (Drillthrough, Slicers, Filters, etc.)
  • Sharing and publishing reports
  • RLS (Role Level Security) in Power BI

Data Visualization With Power BI

  • Creating reports and dashboards
  • Using visualizations such as charts, tables, and maps
  • Creating custom visuals and using marketplace visuals
  • Designing and formatting reports

Advanced Features of Power BI

  • Creating and managing data hierarchies
  • Using drillthrough and drilldown features
  • Creating dynamic reports with slicers, filters, and bookmarks
  • Using Power BI mobile app
  • Sharing and publishing reports to Power BI service and dashboards
  • Workspace in Power BI
  • Role User in Power BI workspace
  • RLS (Role Level Security) in Power BI and why do we need it?
  • Sharing and collaboration (Workspace)
  • Using the Power BI service to collaborate with others
  • Setting up security and permissions for reports and data

Python II: Working With Libraries

  • Dataframe of Pandas with Python
  • Series of Pandas with Python
  • Access Data from Dataframe of Pandas
  • Data Processing with Pandas (Load Data into Pandas and Clean)
  • Dynamic Array of Numpy with Python
  • Create Chart, Plots Using matplotlib with Python

Exploratory Data Analysis

  • What is Exploratory Data Analysis (EDA)?
  • Why is it important?
  • Type of EDA
  • What is cross validation?

Feature Engineering (Data Processing)

  • Feature Selection
  • Handling missing values
  • Handling imbalanced data
  • Handling outliers
  • Binning
  • Encoding
  • Feature Scaling

Predictive Modeling

  • Machine learning Algorithms
  • Type of machine learning
  • Regression models
  • Decision Tree

Time Series Forecasting

  • Introduction to time-series
  • Correlation Forecasting
  • Autoregressive models

Data Processing with PySpark

  • Introduction to time-series
  • Correlation Forecasting
  • Installing PySpark
  • Spark Structure API DataFrame
  • RDD
  • Spark Architecture and Components
  • Tuple
  • Pair RDD
  • FlatMaps and Filters
  • Sorts and Coalesce
  • Joins
  • Spark SQL
  • Datasets
  • DataFrames API
  • Pivot Tables
  • User Defined Function
  • HashAggregation
  • SparkML for Machine Learning
  • Linear Regression Model
  • Training Data
  • Model Fitting Parameters
  • Feature Selection
  • Non-Numeric Data
  • Pipelines
  • K Means Clustering
  • Spark Streaming and Structured Streaming with Kafka

Business Problem Solving, Insights and Storytelling

  • Business domains
  • Understanding the business problem and formulating hypotheses
  • Exploratory data analysis
  • Data storytelling
  • Project on deriving business insights and storytelling

Business Case Studies

  • Real world Business Analytics
  • Focus on Real world problem
  • Business Required