Tuesday 17 December 2013

Exporting from Oracle in UTF8

Use NLS_LANG=ENGLISH_UNITED KINGDOM.UTF8

Identifying failed characters in a postgresql migration from Oracle.

Due to a failure somewhere along the line of a migration from another database to postgresql some of the characters have failed. Since there's a large number of tables and we don't know which columns have characters that have failed here's how we did it;

A pg_dump of the database: pg_dump --data-only --column-inserts -U <user> <db> > dump.txt. Failed characters would show up as '?' so I planned to remove any lines from the text file with a bit of reg exp in notepad++: ^((?!\?).)*$ but the dump file being almost 5Gb was too big for notepad++ to handle. So in steps Powershell and we can do a grep style command: select-string .\dump.txt -pattern "\?" > failed_characters.txt

Bear in mind that this will include valid ? characters but it's start.

And then split the resultant file because it's too big to load into Notepad. [Taken from here]

$linecount=0;$i=0;Get-Content .\BIG_LOG_FILE.txt | %{ Add-Content OUT$i.log "$_";$linecount++; if($linecount -eq 3000){$I++;$linecount=0}}