提问人:dami 提问时间:1/26/2023 最后编辑:brian d foydami 更新时间:7/16/2023 访问量:123
如何使用 DBD::Oracle 处理 Unicode?
How do I handle Unicode with DBD::Oracle?
问:
perl DBI 文档是这样说的:
Perl 支持两种字符串:Unicode(内部为 utf8)和非 Unicode(如果强制采用编码,则默认为 iso-8859-1)。驱动程序应接受这两种类型的字符串,如果需要,应将它们转换为正在使用的数据库的字符集。同样,当从不是 iso-8859-1 的数据库字符数据中提取时,驱动程序应将其转换为 utf8。
DBD::Sqlite 带有参数 (sqlite_unicode => 1),或带有参数 (pg_enable_utf8 => -1) 的 DBD::P g(默认值)确实会进行此类转换。
使用 DBD::Oracle (v1.83, NLS_LANG='FRENCH_FRANCE.UTF8') 事实并非如此:如果将非 Unicode 字符串传递给 INSERT 或 UPDATE 语句,驱动程序不会自动将它们升级到 utf8。
这是我的测试套件。SQLite 和 Pg 的变体成功,但此 Oracle 变体失败:
use utf8;
use strict;
use warnings;
use Test::More;
use SQL::Abstract::More;
use Scalar::Util qw/looks_like_number/;
use DBI;
my @DBI_CONNECT_ARGS = @ARGV;
my ($table, $key_col, $val_col) = qw/TST_UTF8 KEY VAL/; # assuming this table is already created
binmode $_, ':utf8' for *STDERR, *STDOUT;
# strings for tests
my %str;
$str{utf8} = "il était une bergère"; # has flag utf8 because of 'use utf8'
$str{native} = $str{utf8}; utf8::downgrade($str{native}); # without flag utf8
$str{wide_chars} = "il était une bergère♥♡"; # chars > 256 - cannot be a native string (\x{2665}\x{2661})
$str{named_chars} = "il \N{LATIN SMALL LETTER E WITH ACUTE}tait une " # identical to string 'wide_chars'
. "berg\N{LATIN SMALL LETTER E WITH GRAVE}re"
. "\N{BLACK HEART SUIT}\N{WHITE HEART SUIT}";
# check that test strings meet expectations
ok utf8::is_utf8($str{utf8}), "perl string with utf8 flag";
ok !utf8::is_utf8($str{native}), "perl string without utf8 flag, (native chars ... latin1)";
is $str{utf8}, $str{native}, "strings 'utf8' and 'native' have different encodings but represent the same chars";
ok utf8::is_utf8($str{wide_chars}), "string with wide chars must have utf8 flag";
ok utf8::is_utf8($str{named_chars}), "string with named wide chars must have utf8 flag";
is $str{wide_chars}, $str{named_chars}, "named chars are identical to chars from perl source";
my $dbh = DBI->connect(@DBI_CONNECT_ARGS);
my $sqlam = SQL::Abstract::More->new;
my ($sql, @bind);
# suppress records from previous run
my @k = keys %str;
($sql, @bind) = $sqlam->delete(-from => $table, -where => {$key_col => {-in => \@k}});
my $del = $dbh->do($sql, {}, @bind);
note "DELETED $del records";
# insert strings via bind values
while (my ($key, $val) = each %str) {
($sql, @bind) = $sqlam->insert(-into => $table, -values => {$key_col => $key, $val_col => $val});
my $ins = $dbh->do($sql, {}, @bind);
note "INSERT via bind $key: $ins";
}
# read data back
($sql, @bind) = $sqlam->select(-from => $table,
-columns => [$key_col, $val_col],
-where => {$key_col => {-in => \@k}});
my $rows = $dbh->selectall_arrayref($sql, {}, @bind);
my %str_from_db = map {@$_} @$rows;
# check round trip
is_deeply \%str_from_db, \%str, 'round trip with bind values';
# suppress again
($sql, @bind) = $sqlam->delete(-from => $table, -where => {$key_col => {-in => \@k}});
$del = $dbh->do($sql, {}, @bind);
note "DELETED $del records";
# insert strings via raw sql
while (my ($key, $val) = each %str) {
my $ins = $dbh->do("INSERT INTO $table($key_col, $val_col) VALUES ('$key', '$val')");
note "INSERT via raw SQL $key: $ins";
}
# check round trip
is_deeply \%str_from_db, \%str, 'round trip with raw SQL';
作为解决方法,我添加了一些用于自动升级本机字符串的回调;有了这个附加功能,测试就通过了:
$dbh->{Callbacks}{prepare} = sub {
# warn "PREPARE : upgrading stmt: $_[1]\n";
utf8::upgrade($_[1]);
return;
};
$dbh->{Callbacks}{ChildCallbacks}{execute} = sub {
# warn "EXECUTE: ";
foreach my $i (1 .. $#_) {
if ($_[$i] && ! ref $_[$i] && ! looks_like_number(($_[$i]))) {
# warn "upgrading $i : $_[$i];";
utf8::upgrade($_[$i]);
}
}
print STDERR "\n";
return;
};
如果我正确理解 DBI 规范,则此自动升级应由 DBD::Oracle 驱动程序执行,而不是由应用程序代码执行。还是我错过了什么?
答:
这是 DBD::Oracle 的一个缺陷。 已在 https://github.com/perl5-dbi/DBD-Oracle/issues/161 打开一个问题。 同时,Perl 模块 https://metacpan.org/pod/DBIx::Oracle::UpgradeUtf8 可以用作解决方法:它在 DBI 级别拦截方法调用并自动升级没有 utf8 标志的 perl 字符串。
(转载自 https://github.com/perl5-dbi/DBD-Oracle/issues/161)
我更深入地研究了这种情况,得出的结论是,即使 DBD::Oracle 确实像 DBD:SQLite 或 DBD::P g 那样自动升级本机字符串,它也不能完全解决问题。
Perl utf8::upgrade 假定 iso-8859-1 字符集;因此,如果这是自动执行的,则当本机字符集不同时(例如 Windows-1252)就不方便了。因此,我编写了一个新的CPAN模块,该模块允许客户端明确指定要从以下位置解码的字符集:https://metacpan.org/pod/DBIx::AutoUpgrade::NativeStrings。
由于我不再认为 DBD::Oracle 绝对应该进行升级,因此我将关闭该问题。
评论
is_deeply \%str_from_db, \%str
is Dumper(\%str_from_db), Dumper(\%str)
use Data::Dumper; $Data::Dumper::Sortkeys=1;
FRENCH_FRANCE.UTF8
FRENCH_FRANCE.AL32UTF8