ETL: July 2013

Unknown

Filter Transformation Overview:

1. What is a filter transformation?

A filter transformation is used to filter out the rows in mapping. The filter transformation allows the rows that meet the filter condition to pass through and drops the rows that do not meet the condition. Filter transformation is an active transformation.

2. Can we specify more than one filter condition in a filter transformation?

We can only specify one condition in the filter transformation. To specify more than one condition, we have to use router transformation?

3. In which case a filter transformation acts as passive transformation?

If the filter condition is set to TRUE, then it passes all the rows without filtering any data. In this case, the filter transformation acts as passive transformation.

4. Can we concatenate ports from more than one transformation into the filter transformation?

No. The input ports for the filter must come from a single transformation.

5. How to filter the null values and spaces?

Use the ISNULL and IS_SPACES functions
Example: IIF(ISNULL(commission),FALSE,TRUE)

6. How session performance can be improved by using filter transformation?

Keep the filter transformation as close as possible to the sources in the mapping. This allows the unwanted data to be discarded and the integration service processes only the required rows. If the source is relational source, use the source qualifier to filter the rows.

7. What is the Difference between source qualifier filter and Filter Transformation?

Source qualifier filter only used for relation source where as filter used any kind of source like Relational table and flat file

Source qualifier filters data while reading where as filter before loading into the Target.

Difference between Router and Filter transformation in Informatica:

Router	Filter
Router transformation provides us the facility to capture the rows of data that do not meet any of the conditions to a default output group.	A Filter transformation tests data for one condition and drops the rows of data that do not meet the condition.
Router transformation is a single input and multi output group transformation.	Filter is single input and single output group transformation.
In a router transformation, you can specify more than one filter condition.	Filter transformation, you can specify only one filter condition.
The router transformation does not block input rows and those records that failed the filter condition will be passed to the default group	In a filter transformation there is chance that records get blocked
Router transformation acts like IIF condition in informatica or CASE.. WHEN in database.	Filter transformation works as WHERE clause of SQL .

Advantages of Router Transformation over Filter Transformation

Better Performance; because in mapping, the Router transformation Informatica server processes the input data only once instead of as many times as you have conditions in Filter transformation.

Less complexity; because we use only one Router transformation instead of multiple Filter transformation.

Router transformation is more efficient than the Filter transformation.

For Eg:

We have 3 departments in source now we want to send these records into 3 tables.To achieve this we require only one Router transformation.In case we want to get same result with Filter transformation then we require at least 3 Filter transformations.

Similarity:

A Router and Filter transformation are almost same because both transformations allow you to use a condition to test data.

0 comments |

Unknown

What is Transformation:

A Transformation is a power center object which allows you to develop the business rules to process the Data through ports that we connect to mapping or mapplet.

Two types of Transformations

Active Transformation.

Passive Transformation

Active Transformation:

A Transformation which can change the No of rows or affect the no rows when the data is moving from source to target [destination] is Called Active Transformation.

Ø Filter Transformation.

Ø Router Transformation.

Ø Source qualifier Transformation

Ø Aggregator Transformation.

Ø Sorter Transformation.

Ø Joiner Transformation

Ø Rank Transformation.

Ø Union Transformation.

Ø Update strategy Transformation

Ø Normalizer Transformation.

Ø Transaction control Transformation.

Ø SQL Transformation

Passive Transformation:

A transformation does not change the number of rows .when data is moving from source to destination is called Passive Transformation.

Ø Expression Transformation.

Ø Sequence generator Transformation.

Ø Lookup Transformation.

Ø Stored Procedure Transformation.

Ø SQL Transformation.

Ø XMK Source qualifier Transformation

All Active Transformations are used to filtering and cleansing the data.

0 comments |

Unknown

TARGET UPDATE OVERRIDE - INFORMATICA :

When you used an update strategy transformation in the mapping or specified the "Treat Source Rows As" option as update, informatica integration service updates the row in the target table whenever there is match of primary key in the target table found.

The update strategy works only

1) When there is primary key defined in the target definition.
2) When you want update the target table based on the primary key.

What if you want to update the target table by a matching column other than the primary key? In this case the update strategy won’t work. Informatica provides feature, "Target Update Override", to update even on the columns that are not primary key.

You can find the Target Update Override option in the target definition properties tab. The syntax of update statement to be specified in Target Update Override is

Syntax:

UDATE TARGET_TABLE_NAME
SET TARGET_COLUMN1 = :TU.TARGET_PORT1,
[Additional update columns]
WHERE TARGET_COLUMN = :TU.TARGET_PORT
AND [Additional conditions]

Here TU means target update and used to specify the target ports.

Example: Consider the employees table as an example. In the employees table, the primary key is employee_id. Let say we want to update the salary of the employees whose employee name is MARK. In this case we have to use the target update override. The update statement to be specified is

UPDATE EMPLOYEES
SET SALARY = :TU.SAL
WHERE EMPLOYEE_NAME = :TU.EMP_NAME

0 comments |

Unknown

Types of Lookup Caches in Informatica:

Lookup Caches in Informatica	Static cache
	Dynamic cache
	Shared cache
	Persistent cache

Static cache:

Static Cache is same as a Cached Lookup in which once a Cache is created the Integration Service always queries the Cache instead of the Lookup Table.
In Static Cache when the Lookup condition is true it return value from lookup table else returns Null or Default value. In Static Cache the important thing is that you cannot insert or update the cache.

Dynamic cache:

In Dynamic Cache we can insert or update rows in the cache when we pass the rows. The Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The dynamic cache is synchronized with the target

Shared cache:

When we use shared Cache Informatica server creates the cache memory for multiple lookup transformations in the mapping and once the lookup is done for the first lookup then memory is released and use that memory used by the other look up transformation.
We can share the lookup cache between multiple transformations. Unnamed cache is shared between transformations in the same mapping and named cache between transformations in the same or different mappings.

Persistent cache:

If we use Persistent cache Informatica server processes a lookup transformation and saves the lookup cache files and reuses them the next time. The Integration Service saves or deletes lookup cache files after a successful session run based on whether the Lookup cache is checked as persistent or not
In order to make a Lookup Cache as Persistent cache you need to make the following changes
Lookup cache persistent: Needs to be checked
Cache File Name Prefix: Enter the Named Persistent cache file name
Re-cache from lookup source: Needs to be checked

Recache from database:

If the persistent cache is not synchronized with the lookup table you can configure the lookup transformation to rebuild the lookup cache.

0 comments |

Informatica Intwerview questions

Unknown

1) How can you recognise whether or not the newly added rows in the source are gets insert in the target?

ans) In the Type2 maping we have three options to recognise the newly added rows

Version number

Flagvalue

Effective date Range.

2) Where to store informatica rejected data? How to extract the informatica rejected data?

ans)

The reject rows say for example due to unique key constrain is all pushed by session into the $PMBadFileDir (default relative path is <INFA_HOME/PowerCenter/server/infa_shared/BadFiles) which is configured with path at Integration Service level. Every Target will have property saying Reject filename which gives the file in which rejects rows are stored.

3) When does u we use dynamic cache and when do we use static cache in an connected and unconnected lookup transformation?

ans)

We use dynamic cache only for connected lookup. We use dynamic cache to check whether the record already exists in the target table are not. And depending on that, we insert, update or delete the records using update strategy. Static cache is the default cache in both connected and unconnected. If u select static cache on lookup table in infa, it own't update the cache and the row in the cache remain constant. We use this to check the results and also to update slowly changing records

4) After draging the ports of three sources (sql server, oracle, informix) to a single source qualifier, can u map these three ports directly to target?

ans)

If u drag three hetrogenous sources and populated to target without any join means you are entertaining Carteisn product. If you don't use join means not only diffrent sources but homegeous sources are show same error.

If you are not interested to use joins at source qualifier level u can add some joins sepratly.

5) What are the real time problems generally come up while doing/running mapping/any transformation? Can anybody explain with example?

ans)

May be you will encounter with connection failure, other then that i don't think so, cuzserver will handle all the syntex errors and Invalid mappings.

Here are few real time examples of problems while running informatics mappings:

1) Informatica uses ODBC connections to connect to the databases.

The database passwords (production) is changed in a periodic

Manner and the same is not updated at the Informatica side.

Your mappings will fail in this case and you will get database connectivity error.

2) If you are using Update strategy transformation in the mapping, in the session properties

You have to select Treat Source Rows: Data Driven. If we do not select this Informatica

Server will ignore updates and it only inserts rows.

3) If we have mappings loading multiple target tables we have to provide the Target Load Plan

in the sequence we want them to get loaded.

4) Error: Snapshot too old is a very common error when using Oracle tables. We get this error

While using too large tables. Ideally we should schedule these loads when server is not very

Busy (meaning when no other loads are running).

5) We might get some poor performance issues while reading from large tables. All the source tables

Should be indexed and updated regularly.

6) How do we do unit testing in informatica? How do we load data in informatica?

ans) Unit testing are of two types

1. Quantitaive testing

2. Qualitative testing

Steps.

1. First validate the mapping

2. Create session on the mapping and then run workflow.

Once the session is succeeded the right clicks on session and go for statistics tab.

There you can see how many number of source rows are applied and how many number of rows loaded in to targets and how many number of rows rejected. This is called Quantitative testing.

If once rows are successfully loaded then we will go for qualitative testing.

Steps

1. Take the DATM (DATM means where all business rules are mentioned to the corresponding source columns) and check whether the data is loaded according to the DATM in to target table. If any data is not loaded according to the DATM then go and check in the code and rectify it.

This is called Qualitative testing.

This is what a developer will do in Unit Testing.

7) Differences between Normalizer and Normalizer transformation?

ans) Normalizer: It is a transormation mainly using for cobol sources,

it's change the rows into Coolum’s and columns into rows

Normalization: To remove the retundancy and inconsitecy

Normalizer Transformation can be used to obtain multiple columns from a single row.

8) How do you handle decimal places while importing a flat file into informatica?

ans) while importing flat file definition just specify the scale for a numeric data type. in the mapping, the flat file source supports only number data type(no decimal and integer). In the SQ associated with that source will have a data type as decimal for that number port of the source?

source ->number data type port ->SQ -> decimal datatype.Integer is not supported. Hence decimal is taken care.

Import the field as string and then use expression to convert it, so that we can avoid truncation if decimal places in source itself.

9) How do you decide whether you need to do aggregations at database level or at Informatica level?

ans) It depends upon our requirement only. If you have good processing database you can create aggregation table or view at database level else it’s better to use informatica. Here I’m explain why we need to use informatica.

whatever it may be informatica is a third party tool, so it will take more time to process aggregation compared to the database, but in Informatica an option we called "Incremental aggregation" which will help you to update the current values with current values +new values. No necessary to process entire values again and again. Unless this can be done if nobody deleted that cache files. If that happend total aggregation we need to execute on informatica also.

In database we don't have Incremental aggregation facility.

10) Is sorter an active or passive transformation? What happens if we uncheck the distinct option in sorter? Will it be under active or passive transformation?

ans) Sorter is an active transformation. if you don't check the distinct option it is considered as a passive transformation. because this distinct option eliminates the duplicate records from the table.

by default every transformation is passive, after applying any condition or logic then only its becoming like active.

11) How the informatica server sorts the string values in Rank transformation?

ans) When Informatica Server runs in UNICODE data movement mode ,then it uses the sort order configured in session properties.

12) Why is meant by direct and indirect loading options in sessions?

ans) Direct loading can be used to Single transformation where as indirect transformation can be used to multiple transformations or files

In the direct we can perform recovery process

but in Indirect we cant do it .

13) Explain about Informatica server?

Ans) Informatica server,load manager/rs,data transfer manager,reader,temp server and writer are the components of informatica server. first load manager sends a request to the reader if the reader is ready to read the data from source and dump into the temp server and data transfer manager manages the load and it send the request to writer as per first in first out process and writer takes the data from temp server and loads it into the target

14) When we create a target as flat file and source as oracle.. How can i specify first rows as column names in flat files...?

ans) use a pre sql statement....but this is a hardcoding method...if you change the column names or put in extra columns in the flat file, you will have to change the insert statement

You can also achieve this by changing the setting in the Informatica Repository manager to display the columns heading. The only disadvantage of this is that it will be applied on all the files that will be generated by this server

15) How to use the unconnected lookup i.e., from where the input has to be taken and the output is linked?

What condition is to be given?

ans) The unconnected lookup is used just like a function call. in an expression output/variable port or any place where an expression is accepted(like condition in update strategy etc..), call the unconnected lookup...something like :LKP.lkp_abc(input_port)....... (Lkp_abc is the name of the unconnected lookup...(plz check the exact syntax)).....give the input value just like we pass parameters to functions, and it'll return the output after looking up.

16) Can Informatica be used as a Cleansing Tool? If yes, give example of transformations that can implement a data cleansing routine.

ans) Yes, we can use Informatica for cleansing data. Some time we use stages to cleansing the data. It depends upon performance again else we can use expression to cleasing data.

For example an feild X have some values and other with Null values and assigned to target feild where target feild is not null column, inside an expression we can assign space or some constant value to avoid session failure.

The input data is in one format and target is in another format, we can change the format in expression.

We can assign some default values to the target to represent complete set of data in the target.

17) Is a fact table normalized or de-normalized?

ans) A fact table is always DENORMALISED table. It consists of data from dimension table (Primary Key's) and Fact table has foreign keys and measures.

18) How can you delete duplicate rows without using Dynamic Lookup? Tell me any other ways using lookup delete the duplicate rows?

ans) For example u have a table Emp_Name and it has two columns Fname, Lname in the source table which has douplicate rows. In the mapping Create Aggregator transformation. Edit the aggregator transformation select Ports tab select Fname then click the check box on Group By and uncheck the (O) out port. Select Lname then uncheck the (O) out port and click the check box on Group By. Then create 2 new ports Uncheck the (I) import then click Expression on each port. In the first new port Expression type Fname. Then second Newport type Lname. Then close the aggregator transformation link to the target table.

19) What does the expression n filter transformations do in Informatica Slowly growing target wizard?

ans) EXPESSION transformation detects and flags the rows from source.

Filter transformation filters the rows that are not flagged and passes the flagged rows to the Update strategy transformation

20) Difference between Rank and Dense Rank?

ans) Rank:

2<--2nd position

2<--3rd position

Same Rank is assigned to same totals/numbers. Rank is followed by the Position. Golf game usually Ranks this way. This is usually a Gold Ranking.

Dense Rank:

2<--2nd position

2<--3rd position

Same ranks are assigned to same totals/numbers/names. The next rank follows the serial number.

21) What is the method of loading 5 flat files of having same structure to a single target and which transformations I can use?

ans) union transformation,

Otherwise write all file paths of five files in one file and use this file in session properties as indirect

(or)

Use file repository concept (OR)

Two Methods.

1.write all files in one directory then use file repository concept(don’t forget to type source file type as indirect in the session).

2. use union t/r to combine multiple input files into a single target.

22) Why we use stored procedure transformation?

ans) A Stored Procedure transformation is an important tool for populating and maintaining databases. Database administrators create stored procedures to automate time-consuming tasks that are too complicated for standard SQL statements.

23) What are two modes of data movement in Informatica Server?

ans) The data movement mode depends on whether Informatica Server should process single byte or multi-byte character data. This mode selection can affect the enforcement

of code page relationships and code page validation in the Informatica Client and Server.

a) Unicode - IS allows 2 bytes for each character and uses additional byte for each non-ascii character (such as Japanese characters)

b) ASCII - IS holds all data in a single byte.

The IS data movement mode can be changed in the Informatica Server configuration parameters. This comes into effect once you restart the Informatica Server.

24) Can anyone explain error handling in informatica with examples so that it will be easy to explain the same in the interview?

ans) Go to the session log file there we will find the information regarding to the

Session initiation process,

Errors encountered.

Load summary.

so by seeing the errors encountered during the session running, we can resolve the errors.

25) What is change data capture?

ans) Change data capture (CDC) is a set of software design patterns used to determine the data that has changed in a database so that action can be taken using the changed data.

26) Waht are main advantages and purpose of using Normalizer Transformation in Informatica?

ans) Normalizer Transformation is used mainly with COBOL sources where most of the time data is stored in de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a single row of data

1) Normalizer Transformation read the data from COBOL Sources

2) It support Horizontal Pivot .It is a processing of single input into a multiple output

27) What is the procedure to load the fact table. Give in detail?

ans) Based on the requirement to your fact table, choose the sources and data and transform it based on your business needs. For the fact table, you need a primary key so use a sequence generator transformation to generate a unique key and pipe it to the target (fact) table with the foreign keys from the source tables.

28) How to delete duplicate rows in flat files source is any option in informatica?

ans) Use a sorter transformation , in that u will have a "distinct" option make use of it .

29) How can we partition a session in Informatica?

ans) The Informatica? PowerCenter? Partitioning option optimizes parallel processing on multiprocessor hardware by providing a thread-based architecture and built-in data partitioning.

GUI-based tools reduce the development effort necessary to create data partitions and streamline ongoing troubleshooting and performance tuning tasks, while ensuring data integrity throughout the execution process. As the amount of data within an organization expands and real-time demand for information grows, the PowerCenter Partitioning option

enables hardware and applications to provide outstanding performance and jointly scale to handle large volumes of data and users.

30) If you want to create indexes after the load process which transformation you choose?

ans) Its usually not done in the mapping(transformation) level. It’s done in session level. Create a command task which will execute a shell script (if Unix) or any other scripts which contains the create index command. Use this command task in the workflow after the session or else, you can create it with a post session command.

Use unconnected stored procedure.

In stored procedure type=target post-load.

Write procedure in target database.

31) How can you access the remote source into your session?

ans) Relational source: To access relational source which is situated in a remote place ,u need to configure database connection to the data source.

FileSource : To access the remote source file you must configure the FTP connection to the host machine before you create the session.

Hetrogenous: When U?r maping contains more than one source type, the server manager creates a hetrogenous session that displays source options for all types.

32) In update strategy target table or flat file? Which gives more performance? Why?

ans) Pros: Loading, Sorting, Merging operations will be faster as there is no index concept and Data will be in ASCII mode.

Cons: There is no concept of updating existing records in flat file.

As there is no indexes, while lookups speed will be lesser.

33) How the informatica server increases the session performance through partitioning the source?

ans) For a relational sources informatica server creates multiple connections for each parttion of a single source and extracts seperate range of data for each connection.

Informatica server reads multiple partitions of a single source concurently.Similarly for loading also informatica server creates multiple connections to the target and loads partitions of data concurrently.

For XML and file sources,informatica server reads multiple files concurently.For loading the data informatica server creates a seperate file for each partition(of a source file). You can choose to merge the targets.

34) How can you say that union Transormation is Active transformation?

ans) By Definition, Active transformation is the transformation that changes the number of rows that pass through it...in union transformation the number of rows resulting from union can be (are) different from the actual number of rows.

As we are combining results of two select queries using Union Tr Most probably no of rows increases.So it is An Active Tr.

34) What is meant by complex mapping?

ans) Complex maping means involved in more logic and more business rules.

Actually in my project complex mapping is

In my bank project, I involved in construct a 1 data ware house

Many customer is there in my bank project, They r after taking loans relocated in to another place

That time i feel to diffcult maintain both prvious and current addresses

in the sense i am using scd2

This is an simple example of complex mapping

35) what r the options in the target session of update strategy transformation?

ans) Update as Insert:

This option specified all the update records from source to be flagged as inserts in the target. In other words, instead of updating the records in the target they are inserted as new records.

Update else Insert:

This option enables informatica to flag the records either for update if they are old or insert, if they are new records from source.

Insert, update, delete, insert as update, update else insert.update as update.

36) Why did you use stored procedure in your ETL Application?

ans) usage of stored procedure has the following advantages

1checks the status of the target database

2drops and recreates indexes

3determines if enough space exists in the database

4performs specialized calculation

37) What are the different options used to configure the sequential batches?

ans) Two options

Run the session only if previous session completes sucessfully. Always runs the session.

38) Features in 7.1 are:

1. Union and custom transformation

2. Lookup on flat file

3. Grid servers working on different operating systems can coexist on same server

4. We can use pmcmdrep

5. We can export independent and dependent rep objects

6. We can move mapping in any web application

7. Version controlling

8. Data profiling.

38)When to use Abort, Decode functions?
Ans: Abort can be used to Abort / stop the session on an error condition. If the primary key column contains NULL, and you need to stop the session from continuing then you may use ABORT function in the default value for the port. It can be used with IIF and DECODE function to Abort the session.

39)What is constraint based loading?
Ans: Constraint based loading. the data was loaded into the target table based on the Constraints.i.e if we want to load the EMP&DEPT data, first it loads the data of DEPT then EMP because DEPT is PARENT table EMP is CHILD table. In simple terms, it loads PARENT table first then CHILD table.

39)In a joiner transformation, you should specify the source with fewer rows as the master source. Why?
Ans: Joiner transformation compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.

40)What is Incremental Aggregation?
Ans: Whenever a session is created for a mapping Aggregate Transformation, the session option for Incremental Aggregation can be enabled. When PowerCenter performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.

41)What is the default join that source qualifier provides?
Ans: Inner equi join.

42)What are the options in the target session of update strategy transformations?
Ans:
Insert
Delete
Update
Update as update
Update as insert
Update else insert
Truncate table

43)Which transformation should we use to normalize the COBOL and relational sources?
Ans: Normalizer Transformation.

44)What is the use of tracing levels in transformation?
Ans: Tracing levels store information about mapping and transformations.

45)What are the basic needs to join two sources in a source qualifier?
Ans: Two sources should have primary and foreign key relationships. Two sources should have matching data types.

46)What is the use of control break statements?
Ans: They execute a set of codes within the loop and endloop.

47)What are the types of loading in Informatica?
Ans: There are two types of loading, normal loading and bulk loading. In normal loading, it loads record by record and writes log for that. It takes comparatively a longer time to load data to the target in normal loading. But in bulk loading, it loads number of records at a time to target database. It takes less time to load data to target.

48)What is aggregate cache in aggregator transformation?
Ans: The aggregator stores data in the aggregate cache until it completes aggregate calculations. When you run a session that uses an aggregator transformation,the Informatica server creates index and data caches in memory to process the transformation. If the Informatica server requires more space, it stores overflow values in cache files.

49)How Union Transformation is used?
Ans: The union transformation is a multiple input group transformation that can be used to merge data from various sources (or pipelines). This transformation works just like UNION ALL statement in SQL, that is used to combine result set of two SELECT statements.

50)Can two flat files be joined with Joiner Transformation?
Ans: Yes, joiner transformation can be used to join data from two flat file sources.

51)Which transformation should you need while using the Cobol sources as source definitions?
Ans: Normalize transformation which is used to normalize the data. Since cobol sources are oftenly consists of Demoralized data.

52)What are the types of data that passes between informatica server and stored procedure?
Ans: 3 types of data
• Input/Out put parameters
• Return Values
• Status code.
53)What is the status code?
Ans: Status code provides error handling for the informatica server during the session. The stored procedure issues a status code that notifies whether or not stored procedure completed sucessfully.This value can not seen by the user. It only used by the informatica server to determine whether to continue running the session or stop.

54)What is the use of the User defined Environment variables?

User Defined Environment Variables are used to pass the parameters in job level,stage level and sequncer job level

55) Describe the stages of extract / transform process, including what should happen at each stage and in what order?

Extraction means getting data from different/single source and loading into the staging database, in other words brining the data from different sources (like databases or flat diles) into the datastage(or any ETL tool) for further processing.

In the extraction we will ensure the null ability and cleansing the data.

Transformation: where we have to implement the following:

1.the business logics to the necessary fields,
2.populate the primary and foreign keys,
3.aggrecation of data.
4. Data conversion if necessary
5.implmentation of SCD's

56) what are the roles and responsibilities of an ETL developer/designer/team lead?

The ETL developer's responsibility is to ensure that the goals of the business with regards to data management are met. The developer should be able to design and implement a plan for all types of data movement or modification requested and approved by management. He/she should have intimate knowledge of the database schema and the data itself. The ETL developer has a full understanding of each step in the ETL process and can write ETL programs with scripts packages mappings etc. (depending on the tool). Analysis development QA implementation monitoring and administration of the ETL process

57) When can we use stored procedure transformations

whenever you would like to perform some activity on the DB after loading the data or before loading data through ETL process.

58) If i delete one record in master table, then the same record is also deleted in dependent table, for this which transformation can we use?

59 What is the first one executed in a session? Is it the Truncate Target Table option or the Post SQL?

Truncate table will be executed first.Trunc. table gets executed before the table starts loading and post SQL executes after the session is completed (table loaded)

60) why we implement sarrogate key in ETL Process

ETL is not a one time process. It is used to make periodical updates to
dimensional as well as fact data.

Usually dimensions contains lookup data and has less tendency to change but
if it changes and you want to maintain the change history then surrogate key
will help with it

61) If there are 10000 records and while loading, if the session fails in between, how will you load the remaining data?

Using performance recovery option in Session property. Recover session from the last commit level

62. How do you identify existing rows of data in the target table using lookup transformation?

There are two ways to lookup the target table to verify a row exists or not:

Use connect dynamic cache lookup and then check the values of NewLookuprow Output port to decide whether the incoming record already exists in the table / cache or not.
Use Unconnected lookup and call it from an expression trasformation and check the Lookup condition port value (Null/ Not Null) to decide whether the incoming record already exists in the table or not

63)What is Code Page Compatibility?

Compatibility between code pages is used for accurate data movement when the Informatics Sever runs in the Unicode data movement mode. If the code pages are identical, then there will not be any data loss. One code page can be a subset or superset of another. For accurate data movement, the target code page must be a superset of the source code page.

Superset - A code page is a superset of another code page when it contains the character encoded in the other code page; it also contains additional characters not contained in the other code page

Subset - A code page is a subset of another code page when all characters in the code page are encoded in the other code page

64)What are the target options on the servers?

Target Options for File Target type are FTP File, Loader and MQ

There are no target options for ERP target type.

Target Options for Relational are Insert, Update (as Update), Update (as Insert), Update (else Insert), Delete, and Truncate Table.

65)What is Code Page used for?

Code Page is used to identify characters that might be in different languages. If you are importing Japanese data into mapping, u must select the Japanese code page of source data.

66)What is Load Manager?

ANS) I am providing the answer which I have taken it from Informatica 7.1.1 manual,

Ans: While running a Workflow, the Power Center Server uses the Load Manager process and the Data Transformation Manager Process (DTM) to run the workflow and carry out workflow tasks. When the Power Center Server runs a workflow, the Load Manager performs the following tasks:

1. Locks the workflow and reads workflow properties.

2. Reads the parameter file and expands workflow variables.

3. Creates the workflow log file.

4. Runs workflow tasks.

5. Distributes sessions to worker servers.

6. Starts the DTM to run sessions.

7. Runs sessions from master servers.

8. Sends post-session email if the DTM terminates abnormally.

When the Power Center Server runs a session, the DTM performs the following tasks:

1. Fetches session and mapping metadata from the repository.

2. Creates and expands session variables.

3. Creates the session log file.

4. Validates session code pages if data code page validation is enabled. Checks query

Conversions if data code page validation is disabled.

5. Verifies connection object permissions.

6. Runs pre-session shell commands.

7. Runs pre-session stored procedures and SQL.

8. Creates and runs mapping, reader, writer, and transformation threads to extract, transform, and load data.

9. Runs post-session stored procedures and SQL.

10. Runs post-session shell commands.

11. Sends post-session email.

67) How can we store previous session logs?

ANS) Just run the session in time stamp mode then automatically session log will not overwrite current session log.

68) What are two types of processes that informatica runs the session?

ANS) Load manager Process: Starts the session, creates the DTM process, and sends post-session email when the session completes.

The DTM process. Creates threads to initialize the session, read, write, and transform data, and handle pre- and post-session operations.

69) Where is the cache stored in informatica?

ANS) Cache is stored in the Informatica server memory and over flowed data is stored on the disk in file format which will be automatically deleted after the successful completion of the session run. If you want to store that data you have to use a persistent cache.

70) If you have four lookup tables in the workflow. How do you troubleshoot to improve performance?

ANS) There r many ways to improve the mapping which has multiple lookups

1) We can create an index for the lookup table if we have permissions(staging area).

2) Divide the lookup mapping into two (a) dedicate one for insert means: source - target,, these r new rows . Only the new rows will come to mapping and the process will be fast. (b) Dedicate the second one to update : source=target,, these r existing rows. only the rows which exists allready will come into the mapping.

3)we can increase the chache size of the lookup.

71) If the workflow has 5 session and running sequentially and 3rd session has-been failed how can we run again from only 3rd to 5th session?

ANS) If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the batch again. However, if a session in a concurrent batch fails and the rest of the sessions complete successfully, you can recover the session as a standalone session. To recover a session in a concurrent batch:1.Copy the failed session using Operations-Copy Session.2.Drag the copied session outside the batch to be a standalone session.3.Follow the steps to recover a standalone session.4.Delete the standalone copy.

Hi, as per the questions all the sessions are serial. So you can start the session as "start workflow from task" from there it will continue to run the rest of the ta

72) What are Options/Type to run a Stored Procedure?

Normal: During a session, the stored procedure runs where the transformation exists in the mapping on a row-by-row basis. This is useful for calling the stored procedure for each row of data that passes through the mapping, such as running a calculation against an input port. Connected stored procedures run only in normal mode.

Pre-load of the Source. Before the session retrieves data from the source, the stored procedure runs. This is useful for verifying the existence of tables or performing joins of data in a temporary table.

Post-load of the Source. After the session retrieves data from the source, the stored procedure runs. This is useful for removing temporary tables.

Pre-load of the Target. Before the session sends data to the target, the stored procedure runs. This is useful for verifying target tables or disk space on the target system.

Post-load of the Target. After the session sends data to the target, the stored procedure runs. This is useful for re-creating indexes on the database.

It must contain at least one Input and one Output port.

73) If a sequence generator (with increment of 1) is connected to (say) 3 targets and each target uses the NEXTVAL port, what value will each target get?

A: Each target will get the value in multiple of 3.

74) Have you used the Abort, Decode functions?

A: Abort can be used to Abort / stop the session on an error condition.

If the primary key column contains NULL, and you need to stop the session from continuing then you may use ABORT function in the default value for the port. It can be used with IIF and DECODE function to Abort the session.

75) Have you used External loader? What is the difference between normal and bulk loading?

A: External loader will perform direct data load to the table/data files, bypass the SQL layer and will not log the data. During normal data load, data passes through SQL layer, data is logged in to the archive log file and as a result it is slow.

76) Do you enable/disable decimal arithmetic in session properties?

A: Disabling Decimal Arithmetic will improve the session performance but it converts numeric values to double, thus leading to reduced accuracy.

77)What are the Privileges of Default Repository and Extended Repository user?

Ø Default Repository Privileges

o Use Designer

o Browse Repository

o Create Session and Batches

Ø Extended Repository Privileges

o Session Operator

o Administer Repository

o Administer Server

o Super User

78) How many different locks are available for repository objects

A: There are five kinds of locks available on repository objects:

Ø Read lock. Created when you open a repository object in a folder for which you do not have write permission. Also created when you open an object with an existing write lock.

Ø Write lock. Created when you create or edit a repository object in a folder for which you have write permission.

Ø Execute lock. Created when you start a session or batch, or when the Informatica Server starts a scheduled session or batch.

Ø Fetch lock. Created when the repository reads information about repository objects from the database.

Ø Save lock. Created when you save information to the repository.

79)How to improve the Session performance?
1 Run concurrent sessions
2 Partition session (Power center)
3. Tune Parameter – DTM buffer pool, Buffer block size, Index cache size, data cache size, Commit Interval, Tracing level (Normal, Terse, Verbose Init, Verbose Data)
The session has memory to hold 83 sources and targets. If it is more, then DTM can be increased.
The informatica server uses the index and data caches for Aggregate, Rank, Lookup and Joiner
transformation. The server stores the transformed data from the above transformation in the data
cache before returning it to the data flow. It stores group information for those transformations in
index cache.
If the allocated data or index cache is not large enough to store the date, the server stores the data
in a temporary disk file as it processes the session data. Each time the server pages to the disk the
performance slows. This can be seen from the counters .
Since generally data cache is larger than the index cache, it has to be more than the index.

4. Remove Staging area

5. Tune off Session recovery

6. Reduce error tracing

80) What are tracing levels?

Normal-default

Logs initialization and status information, errors encountered, skipped rows due to transformation errors, summarizes session results but not at the row level.

Terse

Log initialization, error messages, notification of rejected data.

Verbose Init.

In addition to normal tracing levels, it also logs additional initialization information, names of index and data files used and detailed transformation statistics.

Verbose Data.

In addition to verbose init, it records row level logs.=

When did u use in dynamic cache? And when did u use in static cache in conn and un conn lookup tr?

We use dynamic cache only for connected lookup. We use dynamic cache to check whether the record already exists in the target table are not. And depending on that, we insert,update or delete the records using update strategy. Static cache is the default cache in both connected and unconnected. If u select static cache on lookup table in infa, it own't update the cache and the row in the cache remain constant. We use this to check the results and also to update slowly changing records

1 comments |

Blog Archive

What is Transformation: