Latest Questions
Post Top Ad
Your Ad Spot
Tuesday, June 18, 2019

40 Top Oracle ETL Interview Questions and Answers {Updated}

Oracle ETL Interview Questions and Answers for experienced PDF, Read commonly asked Oracle ETL Job Interview Questions with Answers PDF for Freshers.


Read Oracle ETL Interview Questions and Answers

What are the various tools? Name a few?
A few more
– Cognos Decision Stream
– Oracle Warehouse Builder
– Business Objects XI (Extreme Insight)
– SAP Business Warehouse
– SAS Enterprise ETL Server
Along with the above, need to include the below tools
– Informatica
– Abintio
– DataStage


What are snapshots? What are materialized views & where do we use them? What is a materialized view?
The materialized view is a view in which data is also stored in some temp table. i.e if we will go with the View concept in DB in that we only store query and once we call View it extract data from DB. But In materialized View data is stored in some temp tables.


What is factless fact table? where you have used it in your project?
A factless table means only the key available in the Fact there are no measures available.
factless fact table means that contains only foreign keys without any measures example: attendance report of employees in a particular company contains no measures only


Can we look-up a table from source qualifier transformation. ie. unconnected lookup?
You cannot look up from a source qualifier directly. However, you can override the SQL in the source qualifier to join with the lookup table to perform the lookup.


Where do we use connected and unconnected lookups?
If return port only one then we can go for unconnected. More than one return port is not possible with Unconnected. If more than one return port then goes for Connected.
If you require dynamic cache i.e where your data will change dynamically then you can go for connected lookup. If your data is static where your data won’t change when the session loads you can go for unconnected lookups


Where do we use semi and nonadditive facts?
Additive: A measure can participate in arithmatic calculations using all or any dimensions.
Ex: Sales profit
Semi-additive: A measure can participate in arithmatic calculations using some dimensions.
Ex: Sales amount
Non-Additve: A measure can’t participate in arithmetic calculations using dimensions.
Ex: temperature


What are non-additive facts in detail?
A fact may be measure, metric or a dollar value. Measure and metric are non-additive facts.
Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount.
A non-additive fact, for eg measure height(s) for ‘citizens by geographical location’, when we rollup ‘city’ data to ‘state’ level data we should not add heights of the citizens rather we may want to use it to derive ‘count’


What is a staging area? Do we need it? What is the purpose of a staging area?
Data staging is actually a collection of processes used to prepare source system data for loading a data warehouse. Staging includes the following steps:
Source data extraction, Data transformation (restructuring), Data transformation (data cleansing, value transformations), Surrogate key assignments


What is a three-tier data warehouse?
A data warehouse can be thought of as a three-tier system in which a middle system provides usable data in a secure way to end users. On either side of this middle system are the end users and the back-end data stores.


What are the various methods of getting incremental records or delta records from the source systems?
One foolproof method is to maintain a field called ‘Last Extraction Date’ and then impose a condition in the code saying ‘current_extraction_date > last_extraction_date’.


What are the various tools? – Name a few?
A few are ….
– Abinitio
– DataStage
– Informatica
– Cognos Decision Stream
– Oracle Warehouse Builder
– Business Objects XI (Extreme Insight)
– SAP Business Warehouse
– SAS Enterprise ETL Server


What is the latest version of Power Center / Power Mart?
The Latest Version is 7.2


What is the difference between Power Center & Power Mart?
PowerCenter – ability to organize repositories into a data mart domain and share metadata across repositories.
PowerMart – only local repository can be created.


What is the various transformation available?
Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Sequence Generator Transformation
Stored Procedure Transformation
Sorter Transformation
Update Strategy Transformation
XML Source Qualifier Transformation
Advanced External Procedure Transformation
External Transformation


What is ODS (operation data source)?
ODS – Operational Data Store.
ODS Comes between staging area & Data Warehouse. The data is ODS will be at a low level of granularity.
Once data was populated in ODS aggregated data will be loaded into EDW through ODS.


What is the difference between the etl tool and OLAP tools?
An ETL tool is meant for extraction data from the legacy systems and loads into the specified database with some process of cleansing data.
ex: Informatica,data stage ….etc
OLAP is meant for Reporting purpose.in OLAP data available in Mulitidimectional model. so that u can write a simple query to extract data from the database.
ex: Business objects, Cognos….etc


What is the metadata extension?
Informatica allows end users and partners to extend the metadata stored in the repository by associating the information with individual objects in the repository. For example, when you create a mapping, you can store your contact information with the mapping. You associate information with repository metadata using metadata extensions.
Informatica Client applications can contain the following types of metadata extensions:
Vendor-defined. Third-party application vendors create vendor-defined metadata extensions. You can view and change the values of vendor-defined metadata extensions, but you cannot create, delete, or redefine them.
User-defined. You create user-defined metadata extensions using PowerCenter/PowerMart. You can create, edit, delete and view user-defined metadata extensions. You can also change the values of user-defined extensions.


What are the various test procedures used to check whether the data is loaded in the backend, performance of the mapping, and quality of the data loaded in INFORMATICA?
The best procedure to take help of debugger where we monitor each and every process of mappings and how data is loading based on conditions breaks.


I am new to SAS, can anybody explain the process of extracting data from source systems, storing in ODS and how data modeling is done?
There are various ways of Extracting Data from Source Systems. For example, You can use a DATA step, an Import Process. It depends on your input data styles. What kind of File/database it is residing in. Storing ur data in an ODS can be done thru an ODS stmt/export stmt/FILE stmt, again which depends on the file & data format, You want your output to be in.


Techniques of Error Handling – Ignore, Rejecting bad records to a flat file, loading the records and reviewing them (default values)?
Rejection of records either at the database due to constraint key violation or the Informatica server when writing data into the target table. These rejected records we can find in the bad files folder where a reject file will be created for a session.we can check why a record has been rejected. And this bad file contains the first column a row indicator and second column a column indicator.
These row indicators or of four types
D-valid data,
O-overflowed data,
N-null data,
T- Truncated data,
And depending on these indicators we can change to load data successfully to target.


What is Full load & Incremental or Refresh load?
Full Load: completely erasing the contents of one or more tables and reloading with fresh data.
Incremental Load: applying ongoing changes to one or more tables based on a predefined schedule.


How to determine what records to extract?
When addressing a table some dimension key must reflect the need for a record to get extracted. Mostly it will be from time dimension (e.g. date >= 1st of current mth) or a transaction flag (e.g. Order Invoiced Stat). Foolproof would be adding an archive flag to record which gets reset when the record changes.


Do we need an ETL tool? When do we go for the tools in the market?
ETL Tool:
It is used to Extract(E) data from multiple source systems(like RDBMS,Flat files,Mainframes,SAP,XML etc) transform(T) them based on Business requirements and Load(L) in target locations.(like tables,files etc).
The need of ETL Tool: An ETL tool is typically required when data scattered across different systems.(like RDBMS,Flat files,Mainframes,SAP,XML etc).


Can we use procedural logic inside Informatica? If yes how, if now how can we use external procedural logic in Informatica?
Yes, you can use advanced external transformation. for more detail, you can refer the manual of Informatica transformation guide in that advance external transformation. You can use c++ language on Unix and c++, vb vc++ on windows server.


Can we override a native SQL query within Informatica? Where do we do it? How do we do it?
Yes, we can override a native SQL query in source qualifier and lookup transformation.
In lookup transformation, we can find “SQL override” in lookup properties. by using this option we can do this.


What are the parameter files? Where do we use them?
Parameter file defines the value for parameter and variable used in a workflow, worklet or session.


How can we use mapping variables in Informatica? Where do we use them?
Yes. we can use mapping variable in Informatica.
The Informatica server saves the value of mapping variable to the repository at the end of the session run and uses that value next time we run the session.


What is a mapping, session, worklet, workflow, mapplet?
A mapping represents dataflow from sources to targets.
A mapplet creates or configures a set of transformations.
A workflow is a set of instruction that tells the Informatica server how to execute the tasks.
A worklet is an object that represents a set of tasks.
A session is a set of instructions that describe how and when to move data from sources to targets.


What is Informatica Metadata and where is it stored?
Informatica Metadata is data about data which stores in Informatica repositories.


How do we call shell scripts from Informatica?
Specify the Full path of the Shell script the “Post session properties
of session/workflow”.


Can Informatica load heterogeneous targets from heterogeneous sources?
yes, you can use heterogenous source and target in a single mapping. But to join data from a heterogenous source you have to use joiner transformation.


What are the different Lookup methods used in Informatica?
1. Connected lookup
2. Unconnected lookup
The connected lookup will receive input from the pipeline and sends output to the pipeline and can return any number of values.it does not contain return port.
Unconnected lookup can return only one column. it contains return to port.


What are the active transformation / Passive transformations?
Active transformation can change the number of rows that pass through it. (decrease or increase rows)
Passive transformation cannot change the number of rows that pass through it.


What are the modules in Power Mart?
1. PowerMart Designer
2. Server
3. Server Manager
4. Repository
5. Repository Manager


Compare ETL & Manual development?
ETL – The process of extracting data from multiple sources. (ex. flat files, XML, COBOL, SAP, etc) is simpler with the help of tools.
Manual – Loading the data other than flat files and Oracle table need more effort.
ETL – High and clear visibility of logic.
Manual – complex and not so user-friendly visibility of logic.
ETL – Contains Metadata and changes can be done easily.
Manual – No Metadata concept and changes need more effort.
ETL- Error handling, log summary, and load progress makes life easier for the developer and maintainer.
Manual – need maximum effort from a maintenance point of view.
ETL – Can handle Historic data very well.
Manual – as data grows the processing time degrades.
These are some differences b/w manual and ETL development.


When do we Analyze the tables? How do we do it?
The ANALYZE statement allows you to validate and compute statistics for an index, table, or cluster. These statistics are used by the cost-based optimizer when it calculates the most efficient plan for retrieval. In addition to its role in statement optimization, ANALYZE also helps in validating object structures and in managing space in your system. You can choose the following operations: COMPUTER, ESTIMATE, and DELETE. An early version of Oracle7 produced unpredictable results when the ESTIMATE operation was used. It is best to compute
your statistics.
EX:
select OWNER,
sum(decode(nvl(NUM_ROWS,9999), 9999,0,1)) analyzed,
sum(decode(nvl(NUM_ROWS,9999), 9999,1,0)) not_analyzed,
count(TABLE_NAME) total
from dba_tables
where OWNER not in (‘SYS’, ‘SYSTEM’)
group by OWNER


What is partitioning? What are the types of partitioning?
If you use PowerCenter, you can increase the number of partitions in a pipeline to improve session performance. Increasing the number of partitions allows the Informatica Server to create multiple connections to sources and process partitions of source data concurrently.
When you create a session, the Workflow Manager validates each pipeline in the mapping for partitioning. You can specify multiple partitions in a pipeline if the Informatica Server can maintain data consistency when it processes the partitioned data.
When you configure the partitioning information for a pipeline, you must specify a partition type at each partition point in the pipeline.
The partition type determines how the Informatica Server redistributes data across partition points.
The Workflow Manager allows you to specify the following partition types:
Round-robin partitioning. The Informatica Server distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows.
For more information, see Round-Robin Partitioning.
Hash partitioning. The Informatica Server applies a hash function to a partition key to group data among partitions. If you select hash auto-keys, the Informatica Server uses all grouped or sorted ports as the partition key. If you select hash user keys, you specify the number of ports to form the partition key. Use hash partitioning where you want to ensure that the Informatica Server processes groups of rows
with the same partition key in the same partition. For more
information, see Hash Partitioning.
Key range partitioning. You specify one or more ports to form a compound partition key. The Informatica Server passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range. For more information, see Key Range Partitioning.
Pass-through partitioning. The Informatica Server passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions.


What are snapshots? What are materialized views & where do we use them? What is a materialized view log?
Snapshots are read-only copies of a master table located on a remote node which is periodically refreshed to reflect changes made to the master table. Snapshots are mirror or replicas of tables.
Views are built using the columns from one or more tables. The Single Table View can be updated but the view with multi-table cannot be updated.
A View can be updated/deleted/inserted if it has only one base table if the view is based on columns from one or more tables then insert, update and delete is not possible.
Materialized view
A pre-computed table comprising aggregated or joined data from fact and possibly dimension tables. Also known as a summary or aggregate table.

Post Top Ad

Your Ad Spot

Pages