My client gave me a mailing list that they wanted to import into Compiere and subscribe them to their various interest areas. The list only contained first name,last name, and an email address. Not exactly a great set of information to load in, but fairly straight forward to import. I customized Compiere to not only have Full Name, but also the first and last names ( full name = name, first name = name2, and last name = name3 ). I loaded the i_bpartner table and ran the process. First thing I noticed was a SQL exception being thrown. I investigated the problem and found some listings had email addresses that were too long. The ImportBPartner server process runs through various update statements to set values such as c_region_id, c_country_id, etc. It also sets the email address as the value if it does not have one set. Im not sure why it does this. If a new business partner is created in Compiere, the value ( search key ) is generated automatically with a numeric value. I think this is an area that should be addressed and be consistent with the rest of the system. Anyway, sadly I had to truncate these few emails by hand. The emails were gmail accounts that take advantage of its alias feature. I fixed them after I imported the data since this was just a few records.
With that data fixed, I ran the imports again. This time I got a poor result from the import process and went to check the log files. I saw a bunch of ad_user.email constraint violations. Damn. I assumed the list that I was given was cleansed, but that was an assumption I should not have made. Normally I’ll take files and run them through Talend, but I have not done any data importing in a while so I just spaced it. The import bpartner process left 1 record with some information in the error field, but of course the error was useless. I ran the process again after I deleted the 1 record that had the error just to see if it worked and of course it didn’t because there were around 73 records that had duplicate email addresses. I went through all the import records, removed the duplicates and reran. This time everything went nicely.
This was all happening the previous day. Today I got a bunch of reports from the call center reps ( I built a customer call center that interfaces with Compiere via its web services. The call center is a .Net application I wrote….long story ) saying pulling up customers was taking a very long time. This baffled me because I only added 7,000 rows ( the contactinterest table has over a million rows so 7,000 was nothing ). What could be slowing the thing down? I was told it only happened for the first part of the morning and so it went back to normal and I just moved on. I went to validate the data I loaded in against the original list and found something. One of the business partners had over 1,300 contacts associated with it. Looking at the created date, I saw they were all from the previous day. Panicing, I ran a query to find all ad_user records that have more than 5 counts of the same c_bpartner_id. I found 1 ad_user record. I can’t be certain of this, but I believe it is related to the import process. It makes sense since Compiere treats emails as the unique key.
The moral of the story is to always check your data.I even considered writing some “pre-check” code to alert me to duplicate emails because for the most part, I am doing inserts, not updates and any duplicate emails would cause problems. The count that shows up at the end of the import process should have tipped me off when it gave me 7000 errors, but 7000 updates. That number count doesn’t make sense, but it should have alerted me mentally. Live and learn.