|
I have several RRF (rich release format) files from the UMLS distribution extracted using their Metamorphosys program on a Windows machine. But I did not have the Oracle table creation and loading scripts, because that option was not selected in Metamorphosys while extracting the RRF files on Windows. So, I copied those RRF files to my linux machine where I had the Oracle scripts from my previous Metamorphosys extraction. When I tried to run the Oracle load script, SQLLDR kept dying with the error "SQL*Loader-510: Physical record in data file XXX.YYY is longer than the maximum (1048576)". It took a bit of searching around the Internet to diagnose and fix this cryptic error message. Ah, the joys of confusing Oracle error messages!
It seems that the problem was because the SQLLDR (Oracle SQL Loader, a command line Oracle data loading utility) program on Linux does not like the Windows-style line breaks. Windows uses CR-LF (carriage return and line break character combination) as the line break character. Linux uses LF as the line break character. (Mac uses CR as the line break character up to Mac OS 9. Mac OS X uses Unix-style line breaks).
The RRF files extracted on Windows had CR-LF as the line break character, and the SQL Loader on Linux did not like it. So, it died with the above mentioned cryptic error message. It threw an error message similar to the following for all tables:
SQL*Loader-510: Physical record in data file MRCONSO.RRF is longer than the maximum (1048576)
I compared the Oracle table's structure, the structure of the CTL file (Oracle control file for data loading) and the structure of the UMLS RRF file (Unified Medical Language System Rich Release Format - a pipe delimited UTF-8 encoded text file used by UMLS for data distribution). They were all identical - the data types, field size and so on. So, I could not understand why SQL Loader was causing that error.
Finally. I found that someone had mentioned in passing on some obscure mailing list on the Internet that SQLLDR on Windows does not like Unix style line breaks. So, I thought the converse must be true. I converted the Windows style line breaks into Unix-style line breaks, and voi-la, it's loading the data faster than I can blink.
Why can't Oracle make error messages meaningful and understantable so that one can really use those messages to troubleshoot it? This is not the first time I encounter such cryptic, confusing and time wasting error messages with Oracle products (Oracle sever, SQL plus client or SQL Loader), and I'm afraid this will not be the last.
By the way, given the huge company that Oracle is, why can't they make the SQLLDR program understand any style of line breaks on any platform? This really makes it hard for those who create datafiles - the file I am loading has several million rows, and it would be a waste of disk space on any distribution media to include different versions with the only difference being the style of line breaks. This was my version with Oracle version 10, not version 0.x or 1.x or even 9.x. Will Oracle listen?
By the way, if you want a way to convert the files, you can use the sed command line program on Linux to do this. If you want a GUI, you can do this with Kwrite (Go to Tools > End of line > Unix) and then save it (preferably under a new name). Hope this helps.
|