Informatica Tutorial: Beginner to Expert Level from Udemy

What's inside

Learning objectives

Understand data warehouse concepts and etl concepts
Describe informatica powercenter architecture & its different components
Use powercenter 10x/9.x components to build mappings, tasks, workflows
Describe the basic and advanced features functionalities of powercenter 10.0/9.6 transformations
Understand workflow task and job handling
Describe mapping parameter and variables
Perform debugging, troubleshooting, error handling and recovery
Learn different types of cache available and how to calculate cache requirement and implement session cache

Execute performance tuning and optimization
Identify and explain the functionalities of the repository manager tool
Identify how to handle services in the administration console
Understand techniques of scd, xml processing, partitioning, constraint based loading and incremental aggregation
Gain insight on etl best practices using informatica
Understand all the basic interview questions for all the transformations and real time scenarios
Show more
Show less

Understand data warehouse concepts and etl concepts
Describe informatica powercenter architecture & its different components
Use powercenter 10x/9.x components to build mappings, tasks, workflows
Describe the basic and advanced features functionalities of powercenter 10.0/9.6 transformations
Understand workflow task and job handling
Describe mapping parameter and variables
Perform debugging, troubleshooting, error handling and recovery
Learn different types of cache available and how to calculate cache requirement and implement session cache
Execute performance tuning and optimization
Identify and explain the functionalities of the repository manager tool
Identify how to handle services in the administration console
Understand techniques of scd, xml processing, partitioning, constraint based loading and incremental aggregation
Gain insight on etl best practices using informatica
Understand all the basic interview questions for all the transformations and real time scenarios
Show more
Show less

Syllabus

This section outlines what we are getting into and what is required for you to start with this course. And, it also outlines the basic overview of what ETL is all about with real time examples.

Thank you and welcome to this course.

In this lecture, I have tried to put in a brief perspective of what you are going to get into and what you will get out of this course.

In this section we will go through the Data Warehouse Concepts and Dimensional Modeling.

The concept of data warehousing is not hard to understand. The notion is to create a permanent storage space for the data needed to support reporting, analysis, and other BI functions. In this lecture we understand what are the main reasons behind creating a data warehouse and the benefits of it.

This long list of benefits is what makes data warehousing an essential management tool for businesses that have reached a certain level of complexity.

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

Business intelligence (BI) is a technology-driven process for analyzing data and presenting actionable information to help corporate executives, business managers and other end users make more informed business decisions.

Business intelligence (BI) is the use of computing technologies for the identification, discovery and analysis of business data - like sales revenue, products, costs and incomes.

BI technologies provide current, historical and predictive views of internally structured data for products and departments by establishing more effective decision-making and strategic operational insights through functions like online analytical processing (OLAP), reporting, predictive analytics, data/text mining, bench marking and Business Performance Management (BPM). These technologies and functions are often referred to as information management.

Data Warehouse Concepts play a critical role in all the Data Warehouse and ETL projects. This course is equipped with the content which is required for you to start.

But, if you want in-depth knowledge on the foundations of the Data Warehouse Concepts, you can enroll to the course as mentioned in the lecture.

Let's answer few questions about the basic questions of Data Warehouse and Business Intelligence.

This section has the details of all the baseline architectures possible for setting up an Enterprise Data warehouse.

In this lecture we see how the Centralized architecture is set up, in which there exists only one data warehouse which stores all data necessary for the business analysis.

In a Federated Architecture the data is logically consolidated but stored in separate physical database, at the same or at different physical sites. The local data marts store only the relevant information for a department.

The amount of data is reduced in contrast to a central data warehouse. The level of detail is enhanced in this kind of model.

A Multi Tired architecture is a distributed data approach. This process cannot be done in a one step because many sources have to be integrated into a warehouse.

Different data warehousing systems have different structures. Some may have an ODS (operational data store), while some may have multiple data marts. Some may have a small number of data sources, while some may have dozens of data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture rather than discussing the specifics of any one system.

In general, all data warehouse systems have the following layers:

Data Source Layer
Data Extraction Layer
Staging Area
ETL Layer
Data Storage Layer
Data Logic Layer
Data Presentation Layer
Metadata Layer
System Operations Layer

This is where data is stored prior to being scrubbed and transformed into a data warehouse / data mart. Having one common area makes it easier for subsequent data processing / integration. Based on the business architecture and design there can be more than one staging area which can be termed with different naming conventions.

Let's review your understanding on the Data Warehouse Architectures

This section talks about ODS, OLAP, OLTP and the differences

An ODS is designed for relatively simple queries on small amounts of data (such as finding the status of a customer order), rather than the complex queries on large amounts of data typical of the data warehouse.

An ODS is similar to your short term memory in that it stores only very recent information; in comparison, the data warehouse is more like long term memory in that it stores relatively permanent information.

To understand the purpose of the ODS and when it is an appropriate solution, its characteristics must first be defined.

Characteristics of an Operational Data Store

Subject Oriented : The ODS contains specific data that is unique to a set of business functions. The data therefore represents a specific subject area.

Integrated : Data in the ODS is sourced from various legacy applications. The source data is taken through a set of ETL operations that includes cleansing and trans-formative processes. These processes are based on rules that have been created through business requirements for data quality and standardization.

Current (non-historical) : The data in the ODS is up-to-date and is a current status of data from the sourcing applications.

Detail : Data in the ODS is primarily used to support operational business functions. This means that there is a specific level of granularity based on business requirements that dictate the level of detail that data in the ODS will have.

This lecture covers the topic of the difference between Staging and ODS.

OLAP (Online Analytical Processing) is the technology behind many Business Intelligence (BI) applications. OLAP is a powerful technology for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and predictive “what if” scenario (budget, forecast) planning.

- OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transnational databases is the entity model (usually 3NF).

- OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema).

Please refer to the additional resources of this section which contains the Info-graphic on the differences between the ODS, DWH, OLTP, OLAP, DSS and DM (Data Mart).

Test your understanding ODS, OLAP, OLTP, Data Warehouse

In this section we talk about all the aspects of Data Mart and its characteristics along with the differences to DWH.

The data mart is a subset of the data warehouse that is usually oriented to a specific business line or team. Data marts are small slices of the data warehouse. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department.

Data Warehouse:

Holds multiple subject areas
Holds very detailed information
Works to integrate all data sources
Does not necessarily use a dimensional model but feeds dimensional models.

Data Mart:

Often holds only one subject area- for example, Finance, or Sales
May hold more summarized data (although many hold full detail)
Concentrates on integrating information from a given subject area or set of source systems
Is built focused on a dimensional model using a star schema.

Test your understanding on Data Marts

In this section, we talk about all the concepts of Dimensional Modeling

A Dimensional Model is a database structure that is optimized for online queries and Data Warehousing tools. It is comprised of "fact" and "dimension" tables. A "fact" is a numeric value that a business wishes to count or sum. A "dimension" is essentially an entry point for getting at the facts.

A dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. Commonly used dimensions are people, products, place and time. In a data warehouse, dimensions provide structured labeling information to otherwise un-ordered numeric measures.

In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is often located at the center of a star schema, surrounded by dimension tables.

There are four types of facts.

Additive - Measures that can be added across all dimensions.
Non Additive - Measures that cannot be added across all dimensions.
Semi Additive – Measures that can be added across few dimensions and not with others.
Fact less fact tables – The fact table does not have aggregate numeric values or information.

A surrogate key is any column or set of columns that can be declared as the primary key instead of a "real" or natural key. Sometimes there can be several natural keys that could be declared as the primary key, and these are all called candidate keys. So a surrogate is a candidate key.

A star schema is the simplest form of a dimensional model, in which data is organized into facts and dimensions.

The snowflake schema is diagrammed with each fact surrounded by its associated dimensions (as in a star schema), and those dimensions are further related to other dimensions, branching out into a snowflake pattern.

When choosing a database schema for a data warehouse, snowflake and star schema tend to be popular choices. This comparison discusses suitability of star vs. snowflake schema in different scenarios and their characteristics.

A conformed dimension is a dimension that has exactly the same meaning and content when being referred from different fact tables. A conformed dimension can refer to multiple tables in multiple data marts within the same organization.

In a Junk dimension, we combine these indicator fields into a single dimension. This way, we'll only need to build a single dimension table, and the number of fields in the fact table, as well as the size of the fact table, can be decreased.

According to Ralph Kimball, in a data warehouse, a degenerate dimension is a dimension key in the fact table that does not have its own dimension table, because all the interesting attributes have been placed in analytic dimensions. The term "degenerate dimension" was originated by Ralph Kimball.

We start with the basic definition of a Dimension, Fact and start with the Slowly Changing Dimensions.

There are many approaches how to deal with SCD. The most popular are:

Type 0 - The passive method
Type 1 - Overwriting the old value
Type 2 - Creating a new additional record
Type 3 - Adding a new column
Type 4 - Using historical table
Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)

Dimension, Fact and SCD Type 1, 2 and 3 are reviewed in this lecture.

Test your understanding on Dimensional Modeling

Indexing the data warehouse can reduce the amount of time it takes to see query results.

Indexing the data warehouse can reduce the amount of time it takes to see query results. When indexing dimensions, you'll want to index on the dimension key. When indexing the fact table, you'll want to index on the date key or the combined data plus time.

A bitmap index is a special kind of database index that uses bitmaps.Bitmap indexes have traditionally been considered to work well for low-cardinality columns, which have a modest number of distinct values, either absolutely, or relative to the number of records that contain the data.

A B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.

One of the common questions which come up in the interviews is which one is the better one to use, Is it Bitmap or B Tree?

In this lecture, we try to evaluate the differences and the best one to use.

Test your understanding on Data Warehouse Indexes

ETL Vs ELT

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution delivers trusted data from a variety of sources.

ETL (Extract, Transform and Load) is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse.

ELT is a variation of the Extract, Transform, Load (ETL), a data integration process in which transformation takes place on an intermediate server before it is loaded into the target. In contrast, ELT allows raw data to be loaded directly into the target and transformed there.

In ETL (extract, transform, load) operations, data are extracted from different sources, transformed separately, and loaded to a Data Warehouse database.

In ELT, the extracts are fed into the single staging database that also handles the transformations.

Though, its not limited to the below, here are some of the commonly used terms in any ETL project.

•Source Systems
•Mapping
•Metadata
•Staging Area
•Data Cleansing/Scrubbing
•MDM - Golden Source
•Transformation
•Target Systems
•Reporting/BI
•Scheduling

Though, its not limited to the below, here are some of the commonly used terms in any ETL project.

•Source Systems
•Mapping
•Metadata
•Staging Area
•Data Cleansing/Scrubbing
•MDM - Golden Source
•Transformation
•Target Systems
•Reporting/BI
•Scheduling

This lecture is in response to the question below:

Could you please elaborate MDM - Golden Source. What does MDM store? Does MDM store dimension data and the data warehouse store fact data where MDM is implemented?

Test your understanding on ETL Vs ELT

In this section we talk about the different Tools/Technologies used in DWH/BI/ETL

In this lecture we talk about the different Enterprises Databases which can be used as a Data Warehouse.

Please note, NoSQL databases are not discussed in this lecture.

In this lecture we talk about the different popular ETL tools available in the market.

Based on the Gartner's magic quadrant we see which ETL tool is the leader in the ETL technologies and what is the best choice for you to learn.

Test your understanding on different types of ETL Tools

This section outlines the basic roles and responsibilities of an ETL developer

The daily activities and the roles and responsibilities of an ETL developer are mentioned. These are covered considering the involvement of the ETL developer at various phases of the Data Warehouse implementation life cycle.

This is in continuation of the previous lecture (Part 2) and we talk about the different responsibilities of an ETL developer.

This is in continuation of the previous lecture (Part 3) and we talk about the different responsibilities of an ETL developer.

Test your understanding on Roles and Responsibilities of an ETL developer

This section focuses on different components laid in the architecture of the Informatica for various services. You will get the in depth details of the components in the architecture.

Informatica Domain is the fundamental administrative unit in Informatica tool. In this lecture we talk about the overall architecture and how the domain is lined with the rest of the components in the architecture.

A node is a logical representation of a machine or a blade. Each node runs a Service Manager that performs domain operations on that node.

Different types of nodes are discussed in this lecture.

Master Gateway Node
Primary Node
Backup node
Worker Node

Test your understanding on Informatica Power Center Architecture

You will learn the different ways to get the right information for the version which you need to download and install the software.

What do you need to get started is described here both for your personal PC and the how should it be done at work.

PAM - Product Availability Matrix is the right place to start for all pre installations checks on what version is compatible for which version of Informatica.

This session shows the way to download the free software from Oracle eDelivery website for Informatica 9.6 and Oracle 11g or 12c.

There are about 16 different files and 3 different versions if Informatica Adapters to download. This lecture shows which files to download.

Oracle 11g Installation and SQL Developer Configuration

This session shows how to extract the client and the server executable from the .ZIP and .gz files downloaded from eDelivery website of Oracle.

Step by Step process on installing the Informatica Server. Informatica Service set up. Explanation of all the options available and the port numbers.

Step by Step process on completing the Client Installation for Power Center and other available client options for Informatica Data Quality and Transformation Studio.

This section focuses on configuring the Application services and understand different properties.

This session provides the overview of the Administration Console page layout and the tabs. Differences between the 8.6 version web page layout and the 9.x version layout. Log Management and the basic differences on what the Monitoring in Administration Console is all about and the Client Informatica Monitor Tool.

This session explains the list of services available in the Administration Console and the purpose of them. The order in which the services should be created and the dependencies. Common Issues and fixes are also discussed.

Step by Step process on what properties to choose for creating the Repository Service.

This session explains the default properties when the service is created and what are the options which can be updated and to what value. What will be the implications if the changes are done and what are the scenarios in which it can be changed.

Integration service is created while installing the Power Center .After creation of Integration service we use Administration console to mange the Integration Services.

What is Integration service?

Integration service is used to read workflow information from the Informatica Repository. Integration services create one or more Integration services processes to manage Workflows. When we run a workflow, what the Integration service does is that it will locks the workflow, runs the workflow tasks, and sessions.

All the properties of Integration Service are covered in this lecture.

Post the services creation, how is the repository configured and how are the folders created is what is discussed in this lecture.

In this lecture we see how to stop and restart the different services of Information from the Administration Console.

Test your understanding on Informatica Power Center Administration Console

In this section we talk about the different versions of Informatica Power Center and their features

New Features of version 10 at a glance & the overall impact:

More features for Business Users on Data Analyst and Metadata management
Improved Administrator experience
Build in Intelligence to improve performance
Better designs with enforced best practices on code development, May reduce development effort drastically for common use cases
Code integration with external Software Configuration tools
Not much for Architects
Hardware upgrade is must for V10
Lot of new features are part of Data Integration and Data Governance area.
Traditional Users will not be impressed much since there are no new features in traditional products like PC, MDM, B2B

In this section we see the Informatica version 10's installation and Configuration steps

Server/Client installation steps for Informatica version 10. The steps are almost the same but there are new additions with couple of new features.

Informatica 10's Administration console is explained with all the new features

Good to know

Know what's good

, what to watch for

, and possible dealbreakers

Builds foundational skills for reporting, analysis, and BI functions

Focuses on the practical use of ETL tools in data warehousing environments, which is core to many jobs

Provides a comprehensive introduction to ETL and all the steps of the process in a way that is easy to understand

Covers the use of SQL queries and data manipulation techniques, which are essential for ETL developers

Led by instructors who have industry experience, which is an indicator of their knowledge in the field

Reviews summary

High-quality informatica course

Learners say they are enjoying their experience with this beginner-level course on the basics of Informatica. The course has a largely positive reception, with students remarking about engaging assignments and practical knowledge. Students say this course provides sound foundational knowledge for students looking to learn more about Informatica. Instructors are described as knowledgeable and supportive. One student mentioned they'd like the course to provide PowerPoint slides, but overall, the sentiment is one of strong satisfaction.

The course offers useful, hands-on knowledge.

"I'm really enjoying the course. I'm learning a lot and I'm finding the material to be very helpful."

"The assignments are engaging and the instructors are very supportive."

"The course is well-paced and the instructors are very knowledgeable."

Students expressed desire for the course to provide PowerPoint slides.

"i am enjoying it...but it would be great if you share the ppt"

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Informatica Tutorial: Beginner to Expert Level with these activities:

Participate in Study Group Discussions

Show steps

Engaging in study group discussions will allow for peer learning and reinforcement of course concepts.

Show steps

Join or form a study group with other learners taking the course.
Meet regularly to discuss course materials and share insights.
Actively participate in discussions and contribute to the group's understanding.

Follow Tutorials on Data Warehousing Concepts

Show steps

Watching tutorials will augment the understanding of data warehouse concepts covered in the course.

Browse courses on Data Warehousing

Show steps

Search for online tutorials on data warehousing concepts.
Choose tutorials that align with the course topics.
Take notes and summarize the key points of each tutorial.

Practice Dimensional Modeling Exercises

Show steps

Practicing exercises will solidify the understanding of dimensional modeling concepts discussed in the course.

Browse courses on Dimensional Modeling

Show steps

Find online exercises or quizzes on dimensional modeling.
Solve the exercises and review the solutions.
Identify areas where improvement is needed and focus on those concepts.

One other activity

Expand to see all activities and additional details

Show all four activities

Write a Summary of Data Warehousing Architecture

Show steps

Writing a summary will reinforce the understanding of data warehousing architecture explained in the course.

Browse courses on Data Warehouse Architecture

Show steps

Gather notes and materials on data warehousing architecture.
Organize and structure the information into a coherent summary.
Review and edit the summary to ensure clarity and accuracy.

Career center

Learners who complete Informatica Tutorial: Beginner to Expert Level will develop knowledge and skills that may be useful to these careers:

Data Architect

Data architects are responsible for designing and maintaining the data infrastructure of an organization. They work with business stakeholders to understand data requirements and then design and implement data solutions that meet those requirements. This course can help you develop the skills you need to be a successful data architect by providing you with a strong foundation in data modeling, data integration, and data governance.

See salaries and explore the career path for Data Architect

Data Warehouse Developer

Data warehouse developers are responsible for building and maintaining data warehouses. They work with data architects and data engineers to design and implement data warehouse solutions. This course can help you develop the skills you need to be a successful data warehouse developer by providing you with a strong foundation in data modeling, data integration, and data quality.

See salaries and explore the career path for Data Warehouse Developer

ETL Developer

ETL developers are responsible for building and maintaining the data pipelines that move data from source systems to target systems. They work with data architects and data engineers to design and implement data solutions. This course can help you develop the skills you need to be a successful ETL developer by providing you with a strong foundation in data integration, data transformation, and data quality.

See salaries and explore the career path for ETL Developer

Data Integration Specialist

Data integration specialists are responsible for integrating data from multiple sources into a single, cohesive data warehouse. They work with data architects and data engineers to design and implement data integration solutions. This course can help you develop the skills you need to be a successful data integration specialist by providing you with a strong foundation in data integration, data transformation, and data quality.

See salaries and explore the career path for Data Integration Specialist

Data Engineer

Data engineers are responsible for building and maintaining the data pipelines that move data from source systems to target systems. They work with data architects to design data solutions and then implement those solutions using data integration tools and technologies. This course can help you develop the skills you need to be a successful data engineer by providing you with a strong foundation in data integration, data transformation, and data quality.

See salaries and explore the career path for Data Engineer

Enterprise Architect

Enterprise architects are responsible for designing and implementing the technology infrastructure of an organization. They work with business stakeholders to understand the organization's needs and then design and implement technology solutions that meet those needs. This course can help you develop the skills you need to be a successful enterprise architect by providing you with a strong foundation in data architecture, data integration, and data governance.

See salaries and explore the career path for Enterprise Architect

Information Architect

Information architects are responsible for designing and implementing the information architecture of an organization. They work with business stakeholders to understand the organization's information needs and then design and implement information solutions that meet those needs. This course can help you develop the skills you need to be a successful information architect by providing you with a strong foundation in data modeling, data integration, and data governance.

See salaries and explore the career path for Information Architect

Technical Architect

Technical architects are responsible for designing and implementing the technology infrastructure of an organization. They work with business stakeholders to understand the organization's needs and then design and implement technology solutions that meet those needs. This course can help you develop the skills you need to be a successful technical architect by providing you with a strong foundation in data architecture, data integration, and data governance.

See salaries and explore the career path for Technical Architect

Data Scientist

Data scientists are responsible for using data to solve business problems. They work with data engineers to access and prepare data, and then use data analysis and machine learning techniques to develop predictive models and other data-driven solutions. This course can help you develop the skills you need to be a successful data scientist by providing you with a strong foundation in data analysis, data mining, and machine learning.

See salaries and explore the career path for Data Scientist

Business Analyst

Business analysts are responsible for understanding the business needs of an organization and then developing solutions to meet those needs. They work with stakeholders to gather requirements, analyze data, and develop recommendations. This course can help you develop the skills you need to be a successful business analyst by providing you with a strong foundation in data analysis, data modeling, and business process modeling.

See salaries and explore the career path for Business Analyst

Data Governance Analyst

Data governance analysts are responsible for developing and implementing data governance policies and procedures. They work with data architects, data engineers, and data scientists to ensure that data is used in a consistent and ethical manner. This course can help you develop the skills you need to be a successful data governance analyst by providing you with a strong foundation in data governance, data security, and data privacy.

See salaries and explore the career path for Data Governance Analyst

Data Quality Analyst

Data quality analysts are responsible for ensuring that data is accurate, complete, and consistent. They work with data architects, data engineers, and data scientists to develop and implement data quality solutions. This course can help you develop the skills you need to be a successful data quality analyst by providing you with a strong foundation in data quality, data governance, and data cleansing.

See salaries and explore the career path for Data Quality Analyst

Systems Analyst

Systems analysts are responsible for analyzing and designing business systems. They work with business stakeholders to understand the business needs and then design and implement systems that meet those needs. This course can help you develop the skills you need to be a successful systems analyst by providing you with a strong foundation in data modeling, data integration, and data governance.

See salaries and explore the career path for Systems Analyst

Database Administrator

Database administrators are responsible for managing and maintaining databases. They work with data architects and data engineers to design and implement database solutions. This course can help you develop the skills you need to be a successful database administrator by providing you with a strong foundation in database management, data security, and data recovery.

See salaries and explore the career path for Database Administrator

Software Engineer

Software engineers are responsible for designing, developing, and maintaining software applications. They work with business stakeholders to understand the application requirements and then design and develop applications that meet those requirements. This course can help you develop the skills you need to be a successful software engineer by providing you with a strong foundation in software development, data integration, and data management.

See salaries and explore the career path for Software Engineer