Latest Questions
Post Top Ad
Your Ad Spot
Tuesday, June 18, 2019

90 Top Data Warehouse Interview Questions and Answers {Updated}

Data Warehouse Interview Questions and Answers for experienced PDF, Read commonly asked Data Warehouse Job Interview Questions with Answers PDF for Freshers.

Read Data Warehouse Interview Questions and Answers

What is a source qualifier?
When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier represents the rows that the Informatica Server reads when it executes a session.

What is data warehouse architecture?
Data warehousing is the repository of integrated information data will be extracted from the heterogeneous sources. Data warehousing architecture contains the different; sources like Oracle, flat files and ERP then after it have the staging area and Data warehousing, after that, it has the different Data marts then it has the reports and it also has the ODS – Operation Data Store. This complete architecture is called the Data warehousing Architecture.

What are data validation strategies for data mart validation after the loading process?
Data validation is to make sure that the loaded data is accurate and meets the business requirements. Strategies are different methods followed to meet the validation requirements.

What is Virtual Data Warehousing?
A virtual or point-to-point data warehousing strategy means that end-users are allowed to get at operational databases directly using whatever tools are enabled to the “data access network”

What is a View?
A view is a virtual table. Every view has a Query attached to it. (The Query is a SELECT statement that identifies the columns and rows of the table(s) the view uses.)

Can objects of the same Schema reside in different tablespaces?

What are the steps involved in Database Startup?
Start an instance, Mount the Database and Open the Database.

What is the data type of the surrogate key?
The data type of the surrogate key is integer, numeric, or number.

What is data analysis? Where it will be used?
Data analysis: consider that you are running a business and you store the data of that; in some form say in a register or in a comp and at the year-end, you want to know the profit or loss then it called data analysis. Data analysis using: then you want to know which product was sold the highest and if the business is running in a loss then finding, where we went wrong we do analysis.

What are the data types present in BO? What happens if we implement the view in the designer n report?
Three different data types: Dimensions, Measure, and DetailView is nothing but an alias and it can be used to resolve the loops in the universe.

What is the difference between metadata and data dictionary?

Metadata is nothing but information about data. It contains the information (i.e. data) about the graphs, its related files, abinitio commands, server information, etc

i.e. all kinds of information about project related information etc.

What is the Extent?
An Extent is a specific number of contiguous data blocks, obtained in a single allocation, and used to store a specific type of information.

Can a Tablespace hold objects from different Schemes?

Which parameter specified in the DEFAULT STORAGE clause of CREATE TABLESPACE cannot be altered after creating the tablespace?

All the default storage parameters defined for the tablespace can be changed using the ALTER TABLESPACE command. When objects are created their INITIAL and MINEXTENS

values cannot be changed.

What are the steps to build the data warehouse?

Gathering business requirements>>Identifying Sources>>Identifying Facts>>Defining Dimensions>>Define Attributes>>Redefine Dimensions / Attributes>>Organize Attribute

Hierarchy>>Define Relationship>>Assign Unique Identifiers

What are data modeling and data mining? Where it will be used?
Data modeling is the process of designing a database model. In this data model, data will be stored in two types of the table fact table and dimension table Fact table contains the transaction data and dimension table contains the master data. Data mining is the process of finding the hidden trends is called data mining.

What is a surrogate key? Where we use it? Explain with examples.

The surrogate key is a substitution for the natural primary key. It is just a unique identifier or number for each row that can be used for the primary key to the table.

The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Info sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME OR CITY_NAME which are stated as the primary keys (according to the business users) but, not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned, you may display only the AIRPORT_NAME.

What is the difference between the mapping parameter & mapping variable in data warehousing?
Mapping Parameter defines the constant value and it cannot change the value throughout the session. Mapping Variables defines the value and it can be changed throughout the session

What is an Index?
An Index is an optional structure associated with a table to have direct access to rows, which can be created to increase the performance of data retrieval. The index can be created on one or more columns of a table.

Can a View based on another View?

What is On-line Redo Log?
The On-line Redo Log is a set of two or more online redo files that record all committed changes made to the database. Whenever a transaction is committed, the corresponding redo entries temporarily stored in redo log buffers of the SGA are written to an on-line redo log file by the background process LGWR. The on-line redo log files are used in a cyclical fashion.

What are the advantages of data mining over traditional approaches?
Data Mining is used for the estimation of the future. For example, if we take a company/business organization, by using the concept of Data Mining, we can predict the future of business in terms of Revenue (or) Employees (or) Customers (or) Orders, etc. Traditional approaches use simple algorithms for estimating the future. However, it does not give accurate results when compared to Data Mining.

What is “method/1??
Method 1 is the system development lifecycle create by Arthur Anderson a while back.

What is a linked cube?
Linked cube in which a subset of the data can be analyzed into detail. The linking ensures that the data in the cubes remain consistent.

Explain the advantages of RAID 1, 1/0, and 5. what type of RAID setup would you put your TX logs.
The basic advantage of RAID is to speed up the data reading from the permanent storage device (hard disk).

What is Integrity Constraints?
An integrity constraint is a declarative way to define a business rule for a column of a table.

What is a full backup?
A full backup is an operating system backup of all data files, on-line redo log files, and control file that constitute the ORACLE database and the parameter.

What is Log Switch?
The point at which ORACLE ends writing to one online redo log file and begins writing to another is called a log switch.

What is the difference between view and materialized view?
View – store the SQL statement in the database and let you use it as a table. Every time you access the view, the SQL statement executes. Materialized view – stores the results of the SQL in table form in the database. SQL statement only executes once and after that every time you run the query, the stored result set is used. Pros include quick query results.

After the generation of a report to whom we have to deploy or what we do after the completion of a report?
The generated report will be sent to the concerned business users through web or LAN.

What are the Characteristics of Data Files?
A data file can be associated with only one database. Once created a data file can’t change size. One or more data files form a logical unit of database storage called a tablespace.

What are Clusters?
Clusters are groups of one or more tables physically stores together to share common columns and are often used together.

What is Mirrored on-line Redo Log?
A mirrored on-line redo log consists of copies of on-line redo log files physically located on separate disks; changes made to one member of the group are made to all members.

How can we run the graph? What is the procedure for that? How can we schedule the graph in UNIX?
If you want to run the graph through GDE then after saving the graph just press F5 button of your keyboard, it will run automatically. If you want to run through the shell script then you have to fire the command at your UNIX box.

What is Dimensional Modelling?

Dimensional Modelling is a design concept used by many data warehouse designers to build their 
data warehouse. In this design model, all the data is stored in two types of tables – Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e., the dimensions on which the facts are calculated.

What is the main difference between Inmon and Kimball philosophies of data warehousing?

Both differed in the concept of building the data warehouse. According to Kimball, Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence, a unified view of the enterprise can be obtained from the dimension modeling on a local departmental level.Inmon beliefs in creating a data warehouse on a subject-by-subject area

basis. Hence, the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.

After the complete generation of a report, who will test the report and who will analyze it?
After the completion of reporting, reports will be sent to business analysts. They will analyze the data from different points of view so that they can make proper business decisions.

What is Rollback Segment?
A Database contains one or more Rollback Segments to temporarily store “undo” information.

What are the different types of Segments?

Data Segment,

Index Segment,

Rollback Segment


Temporary Segment

What is a Partial Backup?
A Partial Backup is an operating system backup short of a full backup, taken while the database is open or shut down.

What is a real-time data warehouse? How is it different from near to real-time data warehouse?
As the term suggests, a real-time data warehouse is a system, which reflects all changes to its sources in real time. As simple as it sounds, this is still an area of active research in the field. In traditional DWH, the operational system(s) are kept separate from the DWH for a good reason. The Operational systems are designed to accept inputs or changes to data regularly, hence have a good chance of being regularly queried. On the other hand, a DWH is supposed to do just the opposite – it is

used to query data for reports only. No changes to data, through user actions is expected (or designed). The only inputs could come from the ETL feed at stipulated times. The ETL would source its data from the Operational systems just explained above.

To create a real-time DWH we would have to merge both systems (several ways are being explored), a concept that is against the reason of creating a DWH. Bigger challenges occur in terms of updating aggregated data in facts at real time, still maintaining the surrogate keys. Besides, we would need lightning fast hardware to try this. Near Real-time DWH is a trade-off between the conventional design and the dream of all clients today. The frequency of ETL updates in higher in this case for

e.g. once in 2 hours. We can also analyze and use selective refreshes at shorter time intervals, while complete refreshes may still be kept further apart. Selective refreshes would look at only those tables that get updated regularly.

What is the difference between Snowflake and Star Schema? What are situations where Snowflake Schema is better than Star Schema to use and when the opposite is true?
Star schema contains the dimension tables mapped around one or more fact tables. It is a renormalized model and no need to use complicated joins. Also queries results fast. Snowflake schema: It is the normalized form of Star schema. It contains in-depth joins because the tables are split into many pieces. We can easily do modification directly in the tables. We have to use complicated joins since we have more tables. There will be some delay in processing the query.

What is a junk dimension? What is the difference between the junk dimension and degenerated dimension?
Junk dimension: Grouping of Random flags and text attributes in a dimension and moving them to a separate sub-dimension. Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with the Fact table, In this case, this dimension is removed and the order information will be directly stored

Can you pass sql queries in filter transformation?
We cannot use SQL queries in filter transformation. It will not allow you to override default SQL query like other transformations (Source Qualifier, lookup)

What is Tablespace?
A database is divided into Logical Storage Unit called tablespaces. The tablespace is used to grouped related logical structures together.

Explain the relationship between Database, Tablespace, and Data file?
Each database logically divided into one or more tablespaces one or more data files are explicitly created for each tablespace.

What is Restricted Mode of Instance Startup?
An instance can be started in (or later altered to be in) restricted mode so that when the database is open connections are limited only to those whose user accounts have been granted the RESTRICTED SESSION system privilege.

What is the difference between drill & scope of analysis?
Drilling can be done in drill down, up, through, and across; the scope is the overall view of the drill exercise.

What is a cube in the data warehousing concept?
Cubes are a logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values.

Why the fact table is in normal form?
The fact table consists of the Index keys of the dimension/lookup tables and the measures. So whenever we have the keys in a table. That it implies that the table is in the normal form.

Where Data cube technology is used?
A multi-dimensional structure called the data cube. Data abstraction allows one to view aggregated data from a number of perspectives. Conceptually, the cube consists of a core or base cuboids, surrounded by a collection of sub-cubes/cuboids that represent the aggregation of the base cuboids along one or more dimensions. We refer to the dimension to be aggregated as the measure attribute, while the remaining dimensions are known as the feature attributes.

What is Database Link?
A database link is a named object that describes a “path” from one database to another.

What is an Index Segment?
Each Index has an Index segment that stores all of its data.

What is Archived Redo Log?
Archived Redo Log consists of Redo Log files that have archived before being reused.

I have two Universes created by two difference database can we join them in Designer & Report level? How
We can link one universe to another universe in Universe parameters.

What are the differences between star and snowflake schema?
Star schema: A single fact table with N number of DimensionSnowflake schema: Any dimensions with extended dimensions are known as snowflake schema.

What is the Difference between E-R Modeling and Dimensional Modeling?
The basic difference is E-R modeling will have a logical and physical model. Dimensional model will have the only physical model. E-R modeling is used for normalizing the OLTP database design. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.

How can you implement many relations in star schema model?
Many-many relations can be implemented by using a snowflake schema. With a max of n dimensions.

What are Private Synonyms?
A Private Synonyms can be accessed only by the owner.

What is a Redo Log?
The set of Redo Log files YSDATE, UID, USER or USERENV SQL functions, or the pseudo-columns LEVEL or ROWNUM.

What are the steps involved in Database Shutdown?
Close the Database; Dismount the Database and Shutdown the Instance.

For a faster process, what we will do with the Universe?
For a faster process create aggregate tables and write better SQL so that the process would fast.

What are Data Marts?
A data mart is a collection of tables focused on specific business group/department. It may have multi-dimensional or normalized. Data marts are usually built from a bigger data warehouse or from operational data.

What is confirmed fact?
Conformed dimensions are the dimensions, which can be used across multiple Data Marts in combination with multiple facts tables accordingly.

What is a critical column?
Let us take one ex: Suppose ‘XYZ’ is a customer in Bangalore, he was residing in the city from the last 5 years, in the period of 5 years he has made purchases worth of 3 lacs. Now, he moved to ‘HYD’. When you update the ‘XYZ’ city to ‘HYD’ in your Warehouse, all the purchases by him will show in the city ‘HYD’ only. This makes warehouse inconsistent. Here CITY is the Critical Column. The solution is to use the Surrogate Key.

What is a Hash Cluster?
A row is stored in a hash cluster based on the result of applying a hash function to the row’s cluster key value. All rows with the same hash key value are stored together on disk.

What are the types of Synonyms?
There are two types of Synonyms Private and Public

What are the advantages of operating a database in ARCHIVELOG mode over operating it in NO ARCHIVELOG mode?
Complete database recovery from disk failure is possible only in ARCHIVELOG mode. Online database backup is possible only in ARCHIVELOG mode.

What is type 2 version dimension?
Version dimension is the SCD type II in real time it using because it will maintain the current data and full historical data.

What are Fact, Dimension, and Measure?
Fact is a key performance indicator to analyze the business. Dimension is used to analyze the fact. Without dimension, there is no meaning for a fact.

What are the methodologies of Data Warehousing?
Every company has the methodology of their own. However, to name a few SDLC Methodology, AIM methodology is standardly used.

What is the main difference between the star and snowflake star schema? Which one is better and why?
If u have one to may relationship in the data then only we choose snowflake schema, as per the performance-wise every-one go for the Star Schema. Moreover, if the ETL is concerned with reporting means to choose for snowflake because this schema provides more browsing capability than the former schema.

Describe Referential Integrity?
A rule defined on a column (or set of columns) in one table that allows the insert or update of a row only if the value for the column or set of columns (the dependent value) matches a value in a column of a related table (the referenced value). It also specifies the type of data manipulation allowed on referenced data and the action to be performed on dependent data as a result of any action on referenced data.

What are the Referential actions supported by FOREIGN KEY integrity constraint?
Update And Delete Restrict – A referential integrity rule that disallows the update or deletion of referenced data. DELETE Cascade – When a referenced row is deleted all associated dependent rows are deleted.

What are the different modes of mounting a Database with the Parallel Server?
Exclusive Mode If the first instance that mounts a database does so in exclusive mode, only that Instance can mount the database. Parallel Mode If the first instance that mounts a database is started in parallel mode, other instances that are started in parallel mode can also mount the database.

What is unit testing?
The Developer created the mapping that can be tested independently by the developer individually.

What are the different types of data warehousing?

Types of 
data warehousing are:

1. Enterprise Data warehousing

2. ODS (Operational Data Store)

3. Data Mart

What is BUS Schema?
BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts.

What is the difference between dependent data warehouse and independent data warehouse?
Dependent departments are those, which depend on a data ware to for their data. Independent department is those, which get their data directly from the operational data sources in the organization.

What is schema?
A schema is a collection of database objects of a User.

Do you View contain Data?
Views do not contain or store data.

Can Full Backup be performed when the database is open?

What is Informatica Architecture?
Informatica Architecture contains Repository, Repository server, Repository server administration console, sources, repository server, and Data warehousing and it has the Designer, Work for the manager, work for monitor combination of all these are called Informatica Architecture.

What do you mean by static and local variable?
A static variable is not created on function stack but is created in the initialized data segment and hence the variable can be shared across the multiple calls of the same function. Usage of static variables within a function is not thread-safe.On the other hand, local variable or auto variable is created on function stack and valid only in the context of the function call and is not shared across function calls.

What is Data warehousing Hierarchy?

Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure. Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels of aggregate

into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies–one for product categories and one for product suppliers. Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse. When designing hierarchies, you must consider

the relationships in business structures. Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.

Which technology should be used for interactive data querying across multiple dimensions for a decision making for a DW?

What is a Table?
A table is the basic unit of data storage in an Oracle database. The tables of a database hold all of the user accessible data. Table data is stored in rows and columns.

What is the use of Control File?
When an instance of an Oracle database is started, its control file is used to identify the database and redo log files that must be opened for database operation to proceed. It is also used in database recovery.

What are the steps involved in Instance Recovery?

Rolling forward to recover data that has not been recorded in data files yet has been recorded in the on-line redo log, including the contents of rollback segments.

Rolling back transactions that have been explicitly rolled back or have not been committed as indicated by the rollback segments regenerated in step a.

1) Releasing any resources (locks) held by transactions in process at the time of the failure.

2) Resolving any pending distributed transactions undergoing a two-phase commit at the time of the instance failure.

Post Top Ad

Your Ad Spot