Filter Transformation Overview:
1. What is a filter transformation?
A filter transformation is used to filter out the rows in mapping. The filter transformation allows the rows that meet the filter condition to pass through and drops the rows that do not meet the condition. Filter transformation is an active transformation.
2. Can we specify more than one filter condition in a filter transformation?
We can only specify one condition in the filter transformation. To specify more than one condition, we have to use router transformation?
3. In which case a filter transformation acts as passive transformation?
If the filter condition is set to TRUE, then it passes all the rows without filtering any data. In this case, the filter transformation acts as passive transformation.
4. Can we concatenate ports from more than one transformation into the filter transformation?
No. The input ports for the filter must come from a single transformation.
5. How to filter the null values and spaces?
Use the ISNULL and IS_SPACES functions
Example: IIF(ISNULL(commission),FALSE,TRUE)
6. How session performance can be improved by using filter transformation?
Keep the filter transformation as close as possible to the sources in the mapping. This allows the unwanted data to be discarded and the integration service processes only the required rows. If the source is relational source, use the source qualifier to filter the rows.
A filter transformation is used to filter out the rows in mapping. The filter transformation allows the rows that meet the filter condition to pass through and drops the rows that do not meet the condition. Filter transformation is an active transformation.
2. Can we specify more than one filter condition in a filter transformation?
We can only specify one condition in the filter transformation. To specify more than one condition, we have to use router transformation?
3. In which case a filter transformation acts as passive transformation?
If the filter condition is set to TRUE, then it passes all the rows without filtering any data. In this case, the filter transformation acts as passive transformation.
4. Can we concatenate ports from more than one transformation into the filter transformation?
No. The input ports for the filter must come from a single transformation.
5. How to filter the null values and spaces?
Use the ISNULL and IS_SPACES functions
Example: IIF(ISNULL(commission),FALSE,TRUE)
6. How session performance can be improved by using filter transformation?
Keep the filter transformation as close as possible to the sources in the mapping. This allows the unwanted data to be discarded and the integration service processes only the required rows. If the source is relational source, use the source qualifier to filter the rows.
7. What is the Difference between source qualifier filter
and Filter Transformation?
Source qualifier filter only used for
relation source where as filter used any kind of source like Relational table
and flat file
Source qualifier filters data while
reading where as filter before loading into the Target.
Difference between Router and Filter transformation in Informatica:
Router
|
Filter
|
Router transformation provides us the facility to capture the
rows of data that do not meet any of the conditions to a default output
group.
|
A Filter transformation tests data for one condition and drops
the rows of data that do not meet the condition.
|
Router transformation is a single input and multi output group
transformation.
|
Filter is single input and single output group transformation.
|
In a router transformation, you can specify more than one filter
condition.
|
Filter transformation, you can specify only one filter
condition.
|
The router transformation does not block input rows and those
records that failed the filter condition will be passed to the default group
|
In a filter transformation there is chance that records get
blocked
|
Router transformation acts like IIF condition in informatica or
CASE.. WHEN in database.
|
Filter transformation works as WHERE clause of SQL .
|
Advantages of Router Transformation over Filter Transformation
- Better Performance; because in mapping, the Router transformation Informatica server processes the input data only once instead of as many times as you have conditions in Filter transformation.
- Less complexity; because we use only one Router transformation instead of multiple Filter transformation.
- Router transformation is more efficient than the Filter transformation.
For Eg:
We have 3 departments in source now we want to send these records into 3 tables.To achieve this we require only one Router transformation.In case we want to get same result with Filter transformation then we require at least 3 Filter transformations.
Similarity:
A Router and Filter transformation are almost same because both transformations allow you to use a condition to test data.
What is Transformation:
A Transformation is a power center object which allows you to
develop the business rules to process the Data through ports that we connect to
mapping or mapplet.
Two types of Transformations
Active
Transformation.
Passive Transformation
Active Transformation:
A
Transformation which can change the No of rows or affect the no rows when the
data is moving from source to target [destination] is Called Active
Transformation.
Ø Filter Transformation.
Ø Router Transformation.
Ø Source qualifier Transformation
Ø Aggregator Transformation.
Ø Sorter Transformation.
Ø Joiner Transformation
Ø Rank Transformation.
Ø Union Transformation.
Ø Update strategy Transformation
Ø Normalizer Transformation.
Ø Transaction control Transformation.
Ø SQL Transformation
Passive
Transformation:
A
transformation does not change the number of rows .when data is moving from
source to destination is called Passive Transformation.
Ø Expression Transformation.
Ø Sequence generator Transformation.
Ø Lookup Transformation.
Ø Stored Procedure Transformation.
Ø SQL Transformation.
Ø XMK Source qualifier Transformation
All Active Transformations are used to filtering and cleansing the
data.
TARGET UPDATE OVERRIDE - INFORMATICA :
When you used an update strategy transformation in the mapping or specified the "Treat Source Rows As" option as update, informatica integration service updates the row in the target table whenever there is match of primary key in the target table found.
The update strategy works only
1) When there is primary key defined in the target definition.
2) When you want update the target table based on the primary key.
What if you want to update the target table by a matching column other than the primary key? In this case the update strategy won’t work. Informatica provides feature, "Target Update Override", to update even on the columns that are not primary key.
You can find the Target Update Override option in the target definition properties tab. The syntax of update statement to be specified in Target Update Override is
Syntax:
UDATE TARGET_TABLE_NAME
SET TARGET_COLUMN1 = :TU.TARGET_PORT1,
[Additional update columns]
WHERE TARGET_COLUMN = :TU.TARGET_PORT
AND [Additional conditions]
Here TU means target update and used to specify the target ports.
Example: Consider the employees table as an example. In the employees table, the primary key is employee_id. Let say we want to update the salary of the employees whose employee name is MARK. In this case we have to use the target update override. The update statement to be specified is
UPDATE EMPLOYEES
SET SALARY = :TU.SAL
WHERE EMPLOYEE_NAME = :TU.EMP_NAME
Types of Lookup Caches in Informatica:
Static cache:
Static Cache is same as a Cached Lookup in which once a Cache is created the Integration Service always queries the Cache instead of the Lookup Table.
In Static Cache when the Lookup condition is true it return value from lookup table else returns Null or Default value. In Static Cache the important thing is that you cannot insert or update the cache.
Dynamic cache:
In Dynamic Cache we can insert or update rows in the cache when we pass the rows. The Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The dynamic cache is synchronized with the target
Shared cache:
When we use shared Cache Informatica server creates the cache memory for multiple lookup transformations in the mapping and once the lookup is done for the first lookup then memory is released and use that memory used by the other look up transformation.
We can share the lookup cache between multiple transformations. Unnamed cache is shared between transformations in the same mapping and named cache between transformations in the same or different mappings.
Persistent cache:
If we use Persistent cache Informatica server processes a lookup transformation and saves the lookup cache files and reuses them the next time. The Integration Service saves or deletes lookup cache files after a successful session run based on whether the Lookup cache is checked as persistent or not
In order to make a Lookup Cache as Persistent cache you need to make the following changes
Lookup cache persistent: Needs to be checked
Cache File Name Prefix: Enter the Named Persistent cache file name
Re-cache from lookup source: Needs to be checked
Recache from database:
If the persistent cache is not synchronized with the lookup table you can configure the lookup transformation to rebuild the lookup cache.
Lookup Caches in Informatica
|
Static cache
|
Dynamic cache
| |
Shared cache
| |
Persistent cache
|
Static cache:
Static Cache is same as a Cached Lookup in which once a Cache is created the Integration Service always queries the Cache instead of the Lookup Table.
In Static Cache when the Lookup condition is true it return value from lookup table else returns Null or Default value. In Static Cache the important thing is that you cannot insert or update the cache.
Dynamic cache:
In Dynamic Cache we can insert or update rows in the cache when we pass the rows. The Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The dynamic cache is synchronized with the target
Shared cache:
When we use shared Cache Informatica server creates the cache memory for multiple lookup transformations in the mapping and once the lookup is done for the first lookup then memory is released and use that memory used by the other look up transformation.
We can share the lookup cache between multiple transformations. Unnamed cache is shared between transformations in the same mapping and named cache between transformations in the same or different mappings.
Persistent cache:
If we use Persistent cache Informatica server processes a lookup transformation and saves the lookup cache files and reuses them the next time. The Integration Service saves or deletes lookup cache files after a successful session run based on whether the Lookup cache is checked as persistent or not
In order to make a Lookup Cache as Persistent cache you need to make the following changes
Lookup cache persistent: Needs to be checked
Cache File Name Prefix: Enter the Named Persistent cache file name
Re-cache from lookup source: Needs to be checked
Recache from database:
If the persistent cache is not synchronized with the lookup table you can configure the lookup transformation to rebuild the lookup cache.
1) How can you recognise whether or not the
newly added rows in the source are gets insert in the target?
ans) In the Type2 maping we have three
options to recognise the newly added rows
Version number
Flagvalue
Effective date Range.
2) Where to store informatica rejected
data? How to extract the informatica rejected data?
ans)
The
reject rows say for example due to unique key constrain is all pushed by
session into the $PMBadFileDir (default relative path is
<INFA_HOME/PowerCenter/server/infa_shared/BadFiles) which is configured with
path at Integration Service level. Every Target will have property saying
Reject filename which gives the file in which rejects rows are stored.
3) When does u we use dynamic cache and
when do we use static cache in an connected and unconnected lookup
transformation?
ans)
We
use dynamic cache only for connected lookup. We use dynamic cache to check
whether the record already exists in the target table are not. And depending on
that, we insert, update or delete the records using update strategy. Static
cache is the default cache in both connected and unconnected. If u select
static cache on lookup table in infa, it own't update the cache and the row in
the cache remain constant. We use this to check the results and also to update
slowly changing records
4) After draging the ports of three sources
(sql server, oracle, informix) to a single source qualifier, can u map these three ports directly to target?
ans)
If
u drag three hetrogenous sources and populated to target without any join means
you are entertaining Carteisn product. If you don't use join means not only
diffrent sources but homegeous sources are show same error.
If you are not interested to use joins
at source qualifier level u can add some joins sepratly.
5) What are the real time problems generally come
up while doing/running mapping/any transformation? Can anybody explain with
example?
ans)
May be you will encounter with connection failure,
other then that i don't think so, cuzserver will handle all the syntex errors
and Invalid mappings.
Here are few real time examples of
problems while running informatics mappings:
1) Informatica uses ODBC connections to
connect to the databases.
The database passwords (production) is
changed in a periodic
Manner and the same is not updated at
the Informatica side.
Your mappings will fail in this case and
you will get database connectivity error.
2) If you are using Update strategy
transformation in the mapping, in the session properties
You have to select Treat Source Rows: Data Driven. If we do not select this
Informatica
Server will ignore updates and it only inserts rows.
3) If we have mappings loading multiple
target tables we have to provide the Target Load Plan
in the sequence we want them to get
loaded.
4) Error: Snapshot too old is a very
common error when using Oracle tables. We get this error
While using too large tables. Ideally we
should schedule these loads when server is not very
Busy (meaning when no other loads are
running).
5) We might get some poor performance
issues while reading from large tables. All the source tables
Should be indexed and updated regularly.
6) How do we do unit testing in informatica?
How do we load data in informatica?
ans) Unit testing are of two types
1. Quantitaive testing
2. Qualitative testing
Steps.
1. First validate the mapping
2. Create session on the mapping and
then run workflow.
Once the session is succeeded the right
clicks on session and go for statistics tab.
There you can see how many number of
source rows are applied and how many number of rows loaded in to targets and
how many number of rows rejected. This is called Quantitative testing.
If once rows are successfully loaded
then we will go for qualitative testing.
Steps
1. Take the DATM (DATM means where all
business rules are mentioned to the corresponding source columns) and check
whether the data is loaded according to the DATM in to target table. If any
data is not loaded according to the DATM then go and check in the code and
rectify it.
This is called Qualitative testing.
This is what a developer will do in Unit
Testing.
7) Differences between Normalizer and
Normalizer transformation?
ans) Normalizer:
It is a transormation mainly using for cobol sources,
it's
change the rows into Coolum’s and columns into rows
Normalization:
To remove the retundancy and inconsitecy
Normalizer
Transformation can be used to obtain multiple columns from a single row.
8) How do you handle decimal places while
importing a flat file into informatica?
ans) while importing flat file definition
just specify the scale for a numeric data type. in the mapping, the flat file
source supports only number data type(no decimal and integer). In the SQ
associated with that source will have a data type as decimal for that number
port of the source?
source ->number data type port
->SQ -> decimal datatype.Integer is not supported. Hence decimal is taken
care.
Import the field as string and then use
expression to convert it, so that we can avoid truncation if decimal places in
source itself.
9) How do you decide whether you need to do
aggregations at database level or at Informatica level?
ans) It
depends upon our requirement only. If you have good processing database you can
create aggregation table or view at database level else it’s better to use
informatica. Here I’m explain why we need to use informatica.
whatever it may be informatica is a third
party tool, so it will take more time to process aggregation compared to the
database, but in Informatica an option we called "Incremental
aggregation" which will help you to update the current values with current
values +new values. No necessary to process entire values again and again. Unless
this can be done if nobody deleted that cache files. If that happend total
aggregation we need to execute on informatica also.
In database we don't have Incremental
aggregation facility.
10) Is sorter an active or passive transformation?
What happens if we uncheck the distinct option in sorter? Will it be under active or
passive transformation?
ans) Sorter is an active transformation.
if you don't check the distinct option it is considered as a passive
transformation. because this distinct option eliminates the duplicate records
from the table.
by default every transformation is
passive, after applying any condition or logic then only its becoming like
active.
11) How the informatica server sorts the
string values in Rank transformation?
ans) When Informatica Server runs in UNICODE
data movement mode ,then it uses the sort order configured in session
properties.
12) Why is meant by direct and indirect
loading options in sessions?
ans) Direct loading can be used to
Single transformation where as indirect transformation can be used to multiple
transformations or files
In the direct we can perform recovery
process
but in Indirect we cant do it .
13) Explain about Informatica server?
Ans) Informatica
server,load manager/rs,data transfer manager,reader,temp server and writer are
the components of informatica server. first load manager sends a request to the
reader if the reader is ready to read the data from source and dump into the
temp server and data transfer manager manages the load and it send the request
to writer as per first in first out process and writer takes the data from temp
server and loads it into the target
14) When we create a target as flat file
and source as oracle.. How can i specify first rows as column names in flat
files...?
ans) use a pre sql statement....but this
is a hardcoding method...if you change the column names or put in extra columns
in the flat file, you will have to change the insert statement
You can also achieve this by changing
the setting in the Informatica Repository manager to display the columns
heading. The only disadvantage of this is that it will be applied on all the
files that will be generated by this server
15) How to use the unconnected lookup i.e.,
from where the input has to be taken and the output is linked?
What condition is to be given?
ans) The unconnected lookup is used just
like a function call. in an expression output/variable port or any place where
an expression is accepted(like condition in update strategy etc..), call the
unconnected lookup...something like :LKP.lkp_abc(input_port)....... (Lkp_abc is
the name of the unconnected lookup...(plz check the exact syntax)).....give the
input value just like we pass parameters to functions, and it'll return the
output after looking up.
16) Can Informatica be used as a Cleansing
Tool? If yes, give example of transformations that can implement a data cleansing
routine.
ans) Yes,
we can use Informatica for cleansing data. Some time we use stages to cleansing
the data. It depends upon performance again else we can use expression to
cleasing data.
For example an feild X have some values
and other with Null values and assigned to target feild where target feild is not
null column, inside an expression we can assign space or some constant value to
avoid session failure.
The input data is in one format and
target is in another format, we can change the format in expression.
We can assign some default values to the
target to represent complete set of data in the target.
17) Is a fact table normalized or
de-normalized?
ans) A
fact table is always DENORMALISED table. It consists of data from dimension
table (Primary Key's) and Fact
table has foreign keys and measures.
18) How can you delete duplicate rows without
using Dynamic Lookup? Tell me any other ways using lookup delete the duplicate rows?
ans) For
example u have a table Emp_Name and it has two columns Fname, Lname in the
source table which has douplicate rows. In the mapping Create Aggregator
transformation. Edit the aggregator transformation select Ports tab select
Fname then click the check box on Group By and uncheck the (O) out port. Select
Lname then uncheck the (O) out port and click the check box on Group By. Then
create 2 new ports Uncheck the (I) import then click Expression on each port.
In the first new port Expression type Fname. Then second Newport type Lname.
Then close the aggregator transformation link to the target table.
19) What does the expression n filter
transformations do in Informatica Slowly growing target wizard?
ans) EXPESSION
transformation detects and flags the rows from source.
Filter transformation filters the rows that are not flagged and passes
the flagged rows to the Update
strategy transformation
20) Difference between Rank and Dense Rank?
ans) Rank:
1
2<--2nd position
2<--3rd position
4
5
Same Rank is assigned to same
totals/numbers. Rank is followed by the Position. Golf game usually Ranks this
way. This is usually a Gold Ranking.
Dense Rank:
1
2<--2nd position
2<--3rd position
3
4
Same ranks are assigned to same
totals/numbers/names. The next rank follows the serial number.
21) What is the method of loading 5 flat
files of having same structure to a single target and which transformations I can use?
ans) union transformation,
Otherwise write all file paths of five files in one file and use this
file in session properties as indirect
(or)
Use file repository concept (OR)
Two Methods.
1.write all files in one directory then use file repository concept(don’t
forget to type source file type
as indirect in the session).
2. use union t/r to combine multiple input files into a single target.
22) Why we use stored procedure
transformation?
ans) A Stored Procedure transformation
is an important tool for populating and maintaining databases. Database administrators create stored
procedures to automate time-consuming tasks that are too complicated for standard SQL
statements.
23) What are two modes of data movement in
Informatica Server?
ans)
The data movement mode depends on whether Informatica Server should
process single byte or
multi-byte character data. This mode selection can affect the enforcement
of code page relationships and code page validation in the Informatica
Client and Server.
a) Unicode - IS allows 2 bytes for each character and uses additional
byte for each non-ascii
character (such as Japanese characters)
b) ASCII - IS holds all data in a single byte.
The IS data movement mode can be
changed in the Informatica Server configuration parameters. This comes into effect once you
restart the Informatica Server.
24) Can anyone explain error handling in
informatica with examples so that it will be easy to explain the same in the interview?
ans) Go to the session log file there we
will find the information regarding to the
Session initiation process,
Errors encountered.
Load summary.
so by seeing the errors encountered during the session running, we can
resolve the errors.
25) What is change data capture?
ans) Change data capture (CDC) is a set
of software design patterns used to determine the data that has changed in a database so that action
can be taken using the changed data.
26) Waht are main advantages and purpose of
using Normalizer Transformation in Informatica?
ans) Normalizer
Transformation is used mainly with COBOL sources where most of the time data
is stored in de-normalized
format. Also, Normalizer transformation can be used to create multiple rows from a single row of data
1) Normalizer Transformation read the data from COBOL Sources
2) It support Horizontal Pivot .It is a processing of single input into
a multiple output
27) What is the procedure to load the fact table.
Give in detail?
ans) Based
on the requirement to your fact table, choose the sources and data and
transform it based on your
business needs. For the fact table, you need a primary key so use a sequence generator transformation to generate a
unique key and pipe it to the target (fact) table with the foreign keys from the source tables.
28) How to delete duplicate rows in flat
files source is any option in informatica?
ans) Use a sorter transformation , in
that u will have a "distinct" option make use of it .
29) How can we partition a session in
Informatica?
ans) The Informatica? PowerCenter?
Partitioning option optimizes parallel processing on multiprocessor hardware by providing a thread-based
architecture and built-in data partitioning.
GUI-based tools reduce the
development effort necessary to create data partitions and streamline ongoing troubleshooting and
performance tuning tasks, while ensuring data integrity throughout the execution process. As the
amount of data within an organization expands and real-time demand for information grows, the PowerCenter
Partitioning option
enables hardware and applications to provide outstanding performance and
jointly scale to handle large
volumes of data and users.
30) If you want to create indexes after the
load process which transformation you choose?
ans)
Its usually not done in the mapping(transformation) level. It’s done in session
level. Create a command
task which will execute a shell script (if Unix) or any other scripts which
contains the create index
command. Use this command task in the workflow after the session or else, you
can create it with a post
session command.
Use unconnected stored procedure.
In stored procedure type=target post-load.
Write procedure in target database.
31) How can you access the remote source
into your session?
ans) Relational source: To access
relational source which is situated in a remote place ,u need to configure database connection to the data
source.
FileSource : To access the remote source file you must configure the FTP
connection to the host machine
before you create the session.
Hetrogenous: When U?r maping contains more than one source type, the
server manager creates a
hetrogenous session that displays source options for all types.
32) In update strategy target table or flat
file? Which gives more performance? Why?
ans) Pros:
Loading, Sorting, Merging operations will be faster as there is no index
concept and Data will be in
ASCII mode.
Cons: There is no concept of updating existing records in flat file.
As there is no indexes, while lookups speed will be lesser.
33) How the informatica server increases
the session performance through partitioning the source?
ans) For
a relational sources informatica server creates multiple connections for each
parttion of a single source and
extracts seperate range of data for each connection.
Informatica server reads multiple partitions of a single source
concurently.Similarly for loading also
informatica server creates multiple connections to the target and loads
partitions of data concurrently.
For XML and file
sources,informatica server reads multiple files concurently.For loading the
data informatica server creates a
seperate file for each partition(of a source file). You can choose to merge the targets.
34) How can you say that union
Transormation is Active transformation?
ans)
By Definition, Active transformation is the transformation that changes the
number of rows that pass through
it...in union transformation the number of rows resulting from union can be
(are) different from the actual number of
rows.
As we are combining results of two select queries using Union Tr Most
probably no of rows increases.So it
is An Active Tr.
34) What is meant by complex mapping?
ans) Complex
maping means involved in more logic and more business rules.
Actually
in my project complex mapping is
In
my bank project, I involved in construct a 1 data ware house
Many
customer is there in my bank project, They r after taking loans relocated in to
another place
That
time i feel to diffcult maintain both prvious and current addresses
in
the sense i am using scd2
This
is an simple example of complex mapping
35) what r the options in the target
session of update strategy transformation?
ans) Update
as Insert:
This
option specified all the update records from source to be flagged as inserts in
the target. In other words,
instead of updating the records in the target they are inserted as new records.
Update
else Insert:
This option enables informatica to
flag the records either for update if they are old or insert, if they are new records from source.
Insert, update, delete, insert as update, update else insert.update as
update.
36) Why did you use stored procedure in
your ETL Application?
ans)
usage of stored procedure has the following advantages
1checks
the status of the target database
2drops
and recreates indexes
3determines
if enough space exists in the database
4performs
specialized calculation
37) What are the different options used to
configure the sequential batches?
ans) Two
options
Run
the session only if previous session completes sucessfully. Always runs the
session.
38) Features in 7.1 are:
1. Union and custom transformation
2. Lookup on flat file
3. Grid servers working on different
operating systems can coexist on same server
4. We can use pmcmdrep
5. We can export independent and
dependent rep objects
6. We can move mapping in any web
application
7. Version controlling
8. Data profiling.
38)When
to use Abort, Decode functions?
Ans: Abort can be used to Abort / stop the session on an error condition. If the primary key column contains NULL, and you need to stop the session from continuing then you may use ABORT function in the default value for the port. It can be used with IIF and DECODE function to Abort the session.
39)What is constraint based loading?
Ans: Constraint based loading. the data was loaded into the target table based on the Constraints.i.e if we want to load the EMP&DEPT data, first it loads the data of DEPT then EMP because DEPT is PARENT table EMP is CHILD table. In simple terms, it loads PARENT table first then CHILD table.
39)In a joiner transformation, you should specify the source with fewer rows as the master source. Why?
Ans: Joiner transformation compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
40)What is Incremental Aggregation?
Ans: Whenever a session is created for a mapping Aggregate Transformation, the session option for Incremental Aggregation can be enabled. When PowerCenter performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.
41)What is the default join that source qualifier provides?
Ans: Inner equi join.
42)What are the options in the target session of update strategy transformations?
Ans:
Insert
Delete
Update
Update as update
Update as insert
Update else insert
Truncate table
43)Which transformation should we use to normalize the COBOL and relational sources?
Ans: Normalizer Transformation.
44)What is the use of tracing levels in transformation?
Ans: Tracing levels store information about mapping and transformations.
45)What are the basic needs to join two sources in a source qualifier?
Ans: Two sources should have primary and foreign key relationships. Two sources should have matching data types.
46)What is the use of control break statements?
Ans: They execute a set of codes within the loop and endloop.
Ans: Abort can be used to Abort / stop the session on an error condition. If the primary key column contains NULL, and you need to stop the session from continuing then you may use ABORT function in the default value for the port. It can be used with IIF and DECODE function to Abort the session.
39)What is constraint based loading?
Ans: Constraint based loading. the data was loaded into the target table based on the Constraints.i.e if we want to load the EMP&DEPT data, first it loads the data of DEPT then EMP because DEPT is PARENT table EMP is CHILD table. In simple terms, it loads PARENT table first then CHILD table.
39)In a joiner transformation, you should specify the source with fewer rows as the master source. Why?
Ans: Joiner transformation compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
40)What is Incremental Aggregation?
Ans: Whenever a session is created for a mapping Aggregate Transformation, the session option for Incremental Aggregation can be enabled. When PowerCenter performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.
41)What is the default join that source qualifier provides?
Ans: Inner equi join.
42)What are the options in the target session of update strategy transformations?
Ans:
Insert
Delete
Update
Update as update
Update as insert
Update else insert
Truncate table
43)Which transformation should we use to normalize the COBOL and relational sources?
Ans: Normalizer Transformation.
44)What is the use of tracing levels in transformation?
Ans: Tracing levels store information about mapping and transformations.
45)What are the basic needs to join two sources in a source qualifier?
Ans: Two sources should have primary and foreign key relationships. Two sources should have matching data types.
46)What is the use of control break statements?
Ans: They execute a set of codes within the loop and endloop.
47)What
are the types of loading in Informatica?
Ans: There are two types of loading, normal loading and bulk loading. In normal loading, it loads record by record and writes log for that. It takes comparatively a longer time to load data to the target in normal loading. But in bulk loading, it loads number of records at a time to target database. It takes less time to load data to target.
48)What is aggregate cache in aggregator transformation?
Ans: The aggregator stores data in the aggregate cache until it completes aggregate calculations. When you run a session that uses an aggregator transformation,the Informatica server creates index and data caches in memory to process the transformation. If the Informatica server requires more space, it stores overflow values in cache files.
49)How Union Transformation is used?
Ans: The union transformation is a multiple input group transformation that can be used to merge data from various sources (or pipelines). This transformation works just like UNION ALL statement in SQL, that is used to combine result set of two SELECT statements.
50)Can two flat files be joined with Joiner Transformation?
Ans: Yes, joiner transformation can be used to join data from two flat file sources.
51)Which transformation should you need while using the Cobol sources as source definitions?
Ans: Normalize transformation which is used to normalize the data. Since cobol sources are oftenly consists of Demoralized data.
52)What are the types of data that passes between informatica server and stored procedure?
Ans: 3 types of data
• Input/Out put parameters
• Return Values
• Status code.
53)What is the status code?
Ans: Status code provides error handling for the informatica server during the session. The stored procedure issues a status code that notifies whether or not stored procedure completed sucessfully.This value can not seen by the user. It only used by the informatica server to determine whether to continue running the session or stop.
Ans: There are two types of loading, normal loading and bulk loading. In normal loading, it loads record by record and writes log for that. It takes comparatively a longer time to load data to the target in normal loading. But in bulk loading, it loads number of records at a time to target database. It takes less time to load data to target.
48)What is aggregate cache in aggregator transformation?
Ans: The aggregator stores data in the aggregate cache until it completes aggregate calculations. When you run a session that uses an aggregator transformation,the Informatica server creates index and data caches in memory to process the transformation. If the Informatica server requires more space, it stores overflow values in cache files.
49)How Union Transformation is used?
Ans: The union transformation is a multiple input group transformation that can be used to merge data from various sources (or pipelines). This transformation works just like UNION ALL statement in SQL, that is used to combine result set of two SELECT statements.
50)Can two flat files be joined with Joiner Transformation?
Ans: Yes, joiner transformation can be used to join data from two flat file sources.
51)Which transformation should you need while using the Cobol sources as source definitions?
Ans: Normalize transformation which is used to normalize the data. Since cobol sources are oftenly consists of Demoralized data.
52)What are the types of data that passes between informatica server and stored procedure?
Ans: 3 types of data
• Input/Out put parameters
• Return Values
• Status code.
53)What is the status code?
Ans: Status code provides error handling for the informatica server during the session. The stored procedure issues a status code that notifies whether or not stored procedure completed sucessfully.This value can not seen by the user. It only used by the informatica server to determine whether to continue running the session or stop.
54)What
is the use of the User defined Environment variables?
User
Defined Environment Variables are used to pass the parameters in job
level,stage level and sequncer job level
55)
Describe the stages of extract / transform process, including what should
happen at each stage and in what order?
Extraction means getting data from different/single
source and loading into the staging database, in other words brining the data from different sources (like
databases or flat diles) into the datastage(or any ETL tool) for further
processing.
In the extraction we will ensure the null ability and cleansing the data.
Transformation: where we have to implement the following:
1.the business logics to the necessary fields,
2.populate the primary and foreign keys,
3.aggrecation of data.
4. Data conversion if necessary
5.implmentation of SCD's
In the extraction we will ensure the null ability and cleansing the data.
Transformation: where we have to implement the following:
1.the business logics to the necessary fields,
2.populate the primary and foreign keys,
3.aggrecation of data.
4. Data conversion if necessary
5.implmentation of SCD's
56)
what are the roles and responsibilities of an ETL developer/designer/team lead?
The
ETL developer's responsibility is to ensure that the goals of the business with regards to
data
management are met. The developer should be able to design and
implement a plan for all types of data movement or modification requested and
approved by management. He/she should have intimate knowledge of the database
schema and the data itself. The ETL developer has a full understanding of each
step in the ETL process and can write ETL programs with scripts packages
mappings etc. (depending on the tool). Analysis development QA implementation
monitoring and administration of the ETL process
57) When can we use stored
procedure transformations
whenever you would like to perform some activity on
the DB after loading the data or before loading data through ETL process.
58) If i delete one record in
master table, then the same record is also deleted in dependent table, for this
which transformation can we use?
59 What
is the first one executed in a session? Is it the Truncate Target Table option
or the Post SQL?
Truncate
table will be executed first.Trunc. table gets executed before the table starts
loading and post SQL executes after the session is completed (table loaded)
60) why we implement sarrogate
key in ETL Process
ETL is not a one
time process. It is used to make periodical updates to
dimensional as well as fact data.
dimensional as well as fact data.
Usually
dimensions contains lookup data and has less tendency to change but
if it changes and you want to maintain the change history then surrogate key
will help with it
if it changes and you want to maintain the change history then surrogate key
will help with it
61) If
there are 10000 records and while loading, if the session fails in between, how
will you load the remaining data?
Using performance recovery option in Session property. Recover session
from the last commit level
62. How do you identify existing rows of data in
the target table using lookup transformation?
There are two ways to lookup the target table to verify a row exists or
not:
- Use connect dynamic cache lookup and then check
the values of NewLookuprow Output port to decide whether the incoming
record already exists in the table / cache or not.
- Use Unconnected lookup and call it from an expression
trasformation and check the Lookup condition port value (Null/ Not Null)
to decide whether the incoming record already exists in the table or not
63)What is Code Page Compatibility?
Compatibility
between code pages is used for accurate data movement when the Informatics Sever runs in the Unicode data movement mode. If the
code pages are identical, then there will not be any data loss. One code page
can be a subset or superset of another. For accurate data movement, the target
code page must be a superset of the source code page.
Superset - A code page is a superset of another code
page when it contains the character encoded in the other code page; it also
contains additional characters not contained in the other code page
Subset - A code page is a subset of another code page
when all characters in the code page are encoded in the other code page
64)What are the target options on the servers?
Target Options for File Target type are FTP File,
Loader and MQ
There are no target options for ERP target type.
Target Options for Relational are Insert, Update (as
Update), Update (as Insert), Update (else Insert), Delete, and Truncate Table.
65)What is Code Page used for?
Code Page is used to identify characters that might be
in different
languages. If you are importing Japanese data into mapping, u must select the Japanese
code page of source data.
66)What is Load Manager?
ANS) I am providing the answer which I have taken it
from Informatica 7.1.1 manual,
Ans: While running a Workflow, the Power Center
Server uses the Load Manager process and the Data Transformation Manager
Process (DTM) to run the workflow and carry out workflow tasks. When the Power
Center Server runs a workflow, the Load Manager performs the following tasks:
1. Locks the workflow and reads workflow properties.
2. Reads the parameter file and expands workflow
variables.
3. Creates the workflow log file.
4. Runs workflow tasks.
5. Distributes sessions to worker servers.
6. Starts the DTM to run sessions.
7. Runs sessions from master servers.
8. Sends post-session email if the DTM terminates
abnormally.
When the Power Center Server runs a session, the DTM
performs the following tasks:
1. Fetches session and mapping metadata from the
repository.
2. Creates and expands session variables.
3. Creates the session log file.
4. Validates session code pages if data code page
validation is enabled. Checks query
Conversions if data code page validation is
disabled.
5. Verifies connection object permissions.
6. Runs pre-session shell commands.
7. Runs pre-session stored procedures and SQL.
8. Creates and runs mapping, reader, writer, and
transformation threads to extract, transform, and load data.
9. Runs post-session stored procedures and SQL.
10. Runs post-session shell commands.
11. Sends post-session email.
67) How can we store previous session logs?
ANS) Just run
the session in time stamp mode then automatically session log will not
overwrite current session log.
68) What are two types of processes that
informatica runs the session?
ANS) Load
manager Process: Starts the session, creates the DTM process, and sends
post-session email when the
session completes.
The
DTM process. Creates threads to initialize the session, read, write, and
transform data, and handle pre- and post-session operations.
69) Where is the cache stored in
informatica?
ANS) Cache is
stored in the Informatica server memory and over flowed data is stored on the
disk in file format which will
be automatically deleted after the successful completion of the session run.
If you want to store that data
you have to use a persistent cache.
70) If you have four lookup tables in the
workflow. How do you troubleshoot to improve performance?
ANS) There r many ways to improve the mapping which
has multiple lookups
1)
We can create an index for the lookup table if we have permissions(staging
area).
2) Divide the lookup mapping into two (a) dedicate one
for insert means: source - target,,
these r new rows . Only the new rows will come to mapping and the
process will be fast. (b) Dedicate the
second one to update : source=target,, these r existing rows. only the rows
which exists allready will come into the
mapping.
3)we
can increase the chache size of the lookup.
71) If the workflow has 5 session and
running sequentially and 3rd session has-been failed how can we run again from only 3rd to 5th session?
ANS) If multiple sessions in a concurrent batch
fail, you might want to truncate all targets and run the batch again. However, if a
session in a concurrent batch fails and the rest of the sessions complete successfully, you can
recover the session as a standalone session. To recover a session in a concurrent batch:1.Copy the failed
session using Operations-Copy Session.2.Drag the copied session outside the batch to be a standalone
session.3.Follow the steps to recover a standalone session.4.Delete the standalone copy.
Hi, as per the questions
all the sessions are serial. So you can start the session as "start workflow from task" from there it will
continue to run the rest of the ta
72) What are Options/Type to run a Stored
Procedure?
A:
Normal:
During a session, the stored procedure runs where the transformation exists in
the mapping on a row-by-row basis. This is useful for calling the stored
procedure for each row of data that passes through the mapping, such as running
a calculation against an input port. Connected stored procedures run only in
normal mode.
Pre-load
of the Source. Before the session retrieves data from
the source, the stored procedure runs. This is useful for verifying the
existence of tables or performing joins of data in a temporary table.
Post-load
of the Source. After the session retrieves data from
the source, the stored procedure runs. This is useful for removing temporary
tables.
Pre-load
of the Target. Before the session sends data to the
target, the stored procedure runs. This is useful for verifying target tables
or disk space on the target system.
Post-load
of the Target. After the session sends data to the
target, the stored procedure runs. This is useful for re-creating indexes on
the database.
It must contain at
least one Input and one Output port.
73) If a sequence generator (with increment
of 1) is connected to (say) 3 targets and each target uses the NEXTVAL port,
what value will each target get?
A: Each target will get the value in multiple of 3.
74) Have you used the Abort, Decode
functions?
A: Abort can be used to Abort / stop the session on
an error condition.
If the primary key column contains NULL, and you
need to stop the session from continuing then you may use ABORT function in the
default value for the port. It can be used with IIF and DECODE function to
Abort the session.
75) Have you used External loader? What is
the difference between normal and bulk loading?
A: External loader will perform direct data load to
the table/data files, bypass the SQL layer and will not log the data. During
normal data load, data passes through SQL layer, data is logged in to the
archive log file and as a result it is slow.
76) Do you enable/disable decimal
arithmetic in session properties?
A: Disabling Decimal Arithmetic will improve the
session performance but it converts numeric values to double, thus leading to
reduced accuracy.
77)What are the Privileges of Default
Repository and Extended Repository user?
A:
Ø Default
Repository Privileges
o
Use
Designer
o
Browse
Repository
o
Create
Session and Batches
Ø Extended
Repository Privileges
o
Session
Operator
o
Administer
Repository
o
Administer
Server
o
Super
User
78) How many different locks are available
for repository objects
A: There are five kinds of locks available on
repository objects:
Ø Read lock.
Created when you open a repository object in a folder for which you do not have
write permission. Also created when you open an object with an existing write
lock.
Ø Write lock. Created
when you create or edit a repository object in a folder for which you have
write permission.
Ø Execute lock.
Created when you start a session or batch, or when the Informatica Server
starts a scheduled session or batch.
Ø Fetch lock.
Created when the repository reads information about repository objects from the
database.
Ø Save lock.
Created when you save information to the repository.
79)How to improve the Session performance?
1 Run concurrent sessions
2 Partition session (Power center)
3. Tune Parameter – DTM buffer pool, Buffer block size, Index cache size, data cache size, Commit Interval, Tracing level (Normal, Terse, Verbose Init, Verbose Data)
The session has memory to hold 83 sources and targets. If it is more, then DTM can be increased.
The informatica server uses the index and data caches for Aggregate, Rank, Lookup and Joiner
transformation. The server stores the transformed data from the above transformation in the data
cache before returning it to the data flow. It stores group information for those transformations in
index cache.
If the allocated data or index cache is not large enough to store the date, the server stores the data
in a temporary disk file as it processes the session data. Each time the server pages to the disk the
performance slows. This can be seen from the counters .
Since generally data cache is larger than the index cache, it has to be more than the index.
1 Run concurrent sessions
2 Partition session (Power center)
3. Tune Parameter – DTM buffer pool, Buffer block size, Index cache size, data cache size, Commit Interval, Tracing level (Normal, Terse, Verbose Init, Verbose Data)
The session has memory to hold 83 sources and targets. If it is more, then DTM can be increased.
The informatica server uses the index and data caches for Aggregate, Rank, Lookup and Joiner
transformation. The server stores the transformed data from the above transformation in the data
cache before returning it to the data flow. It stores group information for those transformations in
index cache.
If the allocated data or index cache is not large enough to store the date, the server stores the data
in a temporary disk file as it processes the session data. Each time the server pages to the disk the
performance slows. This can be seen from the counters .
Since generally data cache is larger than the index cache, it has to be more than the index.
4.
Remove Staging area
5.
Tune off Session recovery
6.
Reduce error tracing
80) What are tracing levels?
Normal-default
Logs
initialization and status information, errors encountered, skipped rows due to
transformation errors, summarizes session results but not at the row level.
Terse
Log
initialization, error messages, notification of rejected data.
Verbose
Init.
In
addition to normal tracing levels, it also logs additional initialization
information, names of index and data files used and detailed transformation
statistics.
Verbose
Data.
In
addition to verbose init, it records row level logs.=
When did u use in dynamic
cache? And when did u use in static cache in conn and un conn lookup tr?
We use dynamic cache only for
connected lookup. We use dynamic cache to check whether the record already
exists in the target table are not. And depending on that, we insert,update or
delete the records using update strategy. Static cache is the default cache in
both connected and unconnected. If u select static cache on lookup table in
infa, it own't update the cache and the row in the cache remain constant. We
use this to check the results and also to update slowly changing records