Perspective on Improving Matching Performance of Large SAP MDM Repositories


Perspective on Improving Matching Performance of Large SAP MDM Repositories

Some immovable requirements result in replicating the data model of SAP ECC in SAP MDM. Result is a large repository with over 250 fields. Once this repository is loaded with over a million records, and you start performing matching or searching using SAP Enterprise Portal you would encounter several performance challenges.
A slow MDM system will never be accepted by business users. To address this eventuality it is essential to proactively identify and mitigate these risks. Clients have to be aware that replicating the data model in SAP MDM might be as easy amongst the options rather than a building a solution on eSOA.  But this approach runs the risk of real time performance.  For accurate matching results we use Token Equals Feature, but this results in huge lead times specially while processing million records with over 250 fields.
Following SAP Document is a good start to help in improving the performance:”How to Optimize an MDM Matching Process
Perspective:
To enable faster performance and searching results, one way is to look at 2 separate repositories. One dedicated for matching and searching tasks and the other holding the parent repository with all fields. The dedicated repository for matching must have only crucial fields such as Matching Fields, Primary & Foreign keys. The Portal could then connect to this smaller repository for matching. Once the results are displayed on SAP Enterprise Portal, the user could then choose to add, delete or update. The resultant action would then connect to main Repository.
Keeping a smaller dedicated repository for matching also reduces the loading time. You cannot use the Slave Feature in Console as a slave repository must have the same fields such as the Parent. As per the SAP Document another good practice is to improve the speed by using Calculated Fields in the smaller repository. These calculated fields are trimmed values of matching criteria for Example First Name can be trimmed to first three characters stored in one calculated fields, first two characters stored in another calculated field etc. Using Equals feature in matching the performance can be extremely fast. But the results of this option might not be as accurate as using token equals as per the analysis done.
Handling the dilemma of choosing precision accuracy with long delays versus faster results with average accuracy was a good learning for us. A lot of “What If Scenarios” needs to be done with multiple options and time taken for each. These options could be trying out choosing different calculated fields, Different matching strategy and scores, and choice of equals or token equals for each matching field.  Analysis of this Performance Improvements would give ideal insight to the approach step with quantified data supporting.  With this study of matching behavior, one should be able to identify the approach for accurate results in shortest time.
A good practice during Blueprint workshops is to present all the results along with choice of matching strategy, scores and threshold limit. If there is disagreement amongst various client stakeholders while identifying the best matching criteria, statistical techniques such as Average Ranking, Spearman’s rank etc can be used.  Since each project is unique, generalization of approach is difficult. For Business Partners to increase accuracy, using calculated fields approach for both First Name and Last Name would be more efficient then just using calculated fields of First Name for example.
Insight into the behavior of business users would give information to decide whether to choose Token Equals or Equals using calculated fields.  You could choose matching criteria with Equals using calculated fields on every day use for business users purely due to the high speed results.  Matching with Token Equals can be used on periodic basis such as weekly or fortnightly to identify possible duplicates by Data Administrator. This dual approach might involve redundant activities but would ensure healthy data.
Data analysis using random sampling would give insight to spread of master data in different categories such as Country, Organization etc.  Depending on the pattern of classification, you could filter records based on Country, Region, Categories-Retail Example Apparel, Food etc. Filtering would enable faster matching performance.
The best practice is to stick to the line of thought that Master Data (Global) should be stored in SAP MDM and rest of transactional fields in respective systems such as SAP ECC etc. This would enable standardized data model and attributes for global use and not replicate legacy or SAP ECC data model in SAP MDM.

Navendu Shirali is a Consultant in SAP MDM Center of Excellence at Infosys . His areas of include building new solutions, converting opportunities and Master Data Consulting.

 

Share this:
Share this page via Email Share this page via Stumble Upon Share this page via Digg this Share this page via Facebook Share this page via Twitter
   Send article as PDF   
Posted in Uncategorized | Tagged | Leave a comment

Using MDM Java APIs to retrieve and execute Matching strategies in MDM

Using MDM Java APIs to retrieve and execute Matching strategies in MDM

Taking forward, the build  for my Customised Data Manger using MDM Java APIs, please find this blog as the new addition the series:
 Using MDM Java APIs to retrieve Taxonomy attribute Values                                 
 Using MDM Java APIs to add Text(type) taxonomy attribute allowable values
Before I demonstrate using MDM Java APIs to retrieve matching strategies, let us understand what are matching strategies and how do we create or add them in MDM.
A matching strategy is comprised of one or more matching rules and a pair of numeric thresholds.Each strategy can be executed against a set of one or more records against the set, the search results, or all of the records in the repository.
Matching strategies identify potential matches for each record based on the matching scores of the individual rules that comprise the strategy and the thresholds that determine which records are potential matches for each record.
1. As a first step we will create a matching strategy from with in Matching Mode of Data Manager, to add a new strategy to the list of strategies:

  • If necessary, click on the Strategies tab to make it the active tab.
  • Right-click in the Strategies pane and choose Add Strategy from the context menu, or choose Records > Matching > Strategies > Add Item from the main menu.
  • MDM adds a new matching strategy named “New Strategy” to the list of strategies, and highlights it for editing.
  • Include the columns against which target records are to matched, say Material_Number in current scenario.
  • Type the name ‘Material Strategy’ for the matching strategy and press Enter.

2. Now since we have created the matching strategy by the name ‘Material Strategy’ let us look at the MDM Java API code snippets, which will be required to execute the matching strategy. Using the Java API, at first we reteieve matching strategies available in MDM repository:

RetrieveMatchingStrategiesCommand retMatStr =  new RetrieveMatchingStrategiesCommand(connection);
retMatStr.setSession(authUserSession);
try {
retMatStr.execute(); }
catch (CommandException e) {
e.printStackTrace(); }
// we have only one matching strategy, retriving the matching strategy id
matchStID = retMatStr.getMatchingStrategies()[1].getId();
….where
connection & authUserSession vaiables hold MDM Connection and authenticated user session respectively
After fetching the matching strategy id, we are left with two following steps:
3. For any new records being created,using the matching strategy id we execute the matching strategy to find mathcing records:

ExecuteMatchingStrategyForNewRecordValuesCommand exeMatstr = new ExecuteMatchingStrategyForNewRecordValuesCommand(connection);
exeMatstr.setSession(authUserSession);
exeMatstr.setStrategyId(matchStID);
exeMatstr.setSource(recs);
exeMatstr.setTarget(rids);
exeMatstr.setTarget(new Search(new TableId(1)));
try
{
exeMatstr.execute();
}
catch (CommandException e)
{
e.printStackTrace();
}
….where
recs is the array of source records, these are records to find matching for, currently only one record is supported
rids is the array of target recod ids, these are the recods to match to
setTarget(new Search(new TableId(1))) sets the target records in form of a search object, these are the records to match to, i.e. all records in main table
4. After executing the matching strategy we execute the RetrievedMatchedRecordsCommand, to retrieve the records that matched the source record for the matching strategy, which was executed:

RetrieveMatchedRecordsCommand retMR = new RetrieveMatchedRecordsCommand(connection);
retMR.setSession(authUserSession);
retMR.setMatchingTaskId(exeMatstr.getMatchingTaskId());
retMR.setRecordId(new RecordId(-1));
retMR.setResultDefinition(new ResultDefinition(new TableId(1)));
try {
retMR.execute(); }
catch (CommandException e) {
e.printStackTrace(); }
…..where
setMatchingTaskId(exeMatstr.getMatchingTaskId()) sets the matching task Id  fetched using executeMatchingStrategy command object from the last step, this identifier is a handle to the matching stratgey that was executed in the last step.
setRecordId(new RecordId(-1)) sets the source record Id, the matched records will be based on this source record.For external record matching, the record Id is -1.

As discussed in the blog, one can use the steps to execute matching strategies already existing in MDM in a different way. Though one might argue for running parallel searches in MDM using the API to accomplish certain kind of matching between records, instead of executing matching strategies this way. I would say this is a feature provided by the java API, it empowers developer to replicate data manager features over the web, and I personally feel using it this way would save a lot of development time required to explicitly code such a matching strategy using the APIs.

 

Share this:
Share this page via Email Share this page via Stumble Upon Share this page via Digg this Share this page via Facebook Share this page via Twitter
   Send article as PDF   
Posted in Uncategorized | Tagged | Leave a comment