如何使用 DBD：：Oracle 处理 Unicode？-解网

问：

perl DBI 文档是这样说的：

Perl 支持两种字符串：Unicode（内部为 utf8）和非 Unicode（如果强制采用编码，则默认为 iso-8859-1）。驱动程序应接受这两种类型的字符串，如果需要，应将它们转换为正在使用的数据库的字符集。同样，当从不是 iso-8859-1 的数据库字符数据中提取时，驱动程序应将其转换为 utf8。

DBD：：Sqlite 带有参数（sqlite_unicode => 1），或带有参数（pg_enable_utf8 => -1）的 DBD：:P g（默认值）确实会进行此类转换。

使用 DBD：：Oracle （v1.83， NLS_LANG='FRENCH_FRANCE.UTF8'）事实并非如此：如果将非 Unicode 字符串传递给 INSERT 或 UPDATE 语句，驱动程序不会自动将它们升级到 utf8。

这是我的测试套件。SQLite 和 Pg 的变体成功，但此 Oracle 变体失败：

use utf8;
use strict;
use warnings;
use Test::More;
use SQL::Abstract::More;
use Scalar::Util   qw/looks_like_number/;
use DBI;

my @DBI_CONNECT_ARGS = @ARGV;
my ($table, $key_col, $val_col) = qw/TST_UTF8 KEY VAL/; # assuming this table is already created


binmode $_, ':utf8' for *STDERR, *STDOUT;

# strings for tests
my %str;
$str{utf8}        = "il était une bergère";                             # has flag utf8 because of 'use utf8'
$str{native}      = $str{utf8}; utf8::downgrade($str{native});          # without flag utf8
$str{wide_chars}  = "il était une bergère♥♡";                          # chars > 256 - cannot be a native string (\x{2665}\x{2661})
$str{named_chars} = "il \N{LATIN SMALL LETTER E WITH ACUTE}tait une "   # identical to string 'wide_chars'
                  . "berg\N{LATIN SMALL LETTER E WITH GRAVE}re"
                  . "\N{BLACK HEART SUIT}\N{WHITE HEART SUIT}";


# check that test strings meet expectations
ok  utf8::is_utf8($str{utf8}),          "perl string with utf8 flag";
ok !utf8::is_utf8($str{native}),        "perl string without utf8 flag, (native chars ... latin1)";
is $str{utf8}, $str{native},            "strings 'utf8' and 'native' have different encodings but represent the same chars";
ok utf8::is_utf8($str{wide_chars}),     "string with wide chars must have utf8 flag";
ok utf8::is_utf8($str{named_chars}),    "string with named wide chars must have utf8 flag";
is $str{wide_chars}, $str{named_chars}, "named chars are identical to chars from perl source";



my $dbh   = DBI->connect(@DBI_CONNECT_ARGS);
my $sqlam = SQL::Abstract::More->new;
my ($sql, @bind);


# suppress records from previous run
my @k = keys %str;
($sql, @bind) = $sqlam->delete(-from => $table, -where => {$key_col => {-in => \@k}});
my $del = $dbh->do($sql, {}, @bind);
note "DELETED $del records";

# insert strings via bind values
while (my ($key, $val) = each %str) {
  ($sql, @bind) = $sqlam->insert(-into => $table, -values => {$key_col => $key, $val_col => $val});
  my $ins = $dbh->do($sql, {}, @bind);
  note "INSERT via bind $key: $ins";
}

# read data back
($sql, @bind) = $sqlam->select(-from    => $table,
                               -columns => [$key_col, $val_col],
                               -where   => {$key_col => {-in => \@k}});
my $rows = $dbh->selectall_arrayref($sql, {}, @bind);
my %str_from_db = map {@$_} @$rows;

# check round trip
is_deeply \%str_from_db, \%str, 'round trip with bind values';

# suppress again
($sql, @bind) = $sqlam->delete(-from => $table, -where => {$key_col => {-in => \@k}});
$del = $dbh->do($sql, {}, @bind);
note "DELETED $del records";

# insert strings via raw sql
while (my ($key, $val) = each %str) {
  my $ins = $dbh->do("INSERT INTO $table($key_col, $val_col) VALUES ('$key', '$val')");
  note "INSERT via raw SQL $key: $ins";
}

# check round trip
is_deeply \%str_from_db, \%str, 'round trip with raw SQL';

作为解决方法，我添加了一些用于自动升级本机字符串的回调;有了这个附加功能，测试就通过了：

   $dbh->{Callbacks}{prepare} = sub {
      # warn "PREPARE : upgrading stmt: $_[1]\n";
      utf8::upgrade($_[1]);
      return;
    };
    $dbh->{Callbacks}{ChildCallbacks}{execute} = sub {
      # warn "EXECUTE: ";
      foreach my $i (1 .. $#_) {
        if ($_[$i] && ! ref $_[$i] && ! looks_like_number(($_[$i]))) {
          # warn "upgrading $i : $_[$i];";
          utf8::upgrade($_[$i]);
        }
      }
      print STDERR "\n";
      return;
    };

如果我正确理解 DBI 规范，则此自动升级应由 DBD：：Oracle 驱动程序执行，而不是由应用程序代码执行。还是我错过了什么？

Oracle Perl UTF-8 DBI

不确定我是否理解以及它是否相关，但我在添加后替换了测试并通过。另外，应该用？（但我认为这无关紧要）is_deeply \%str_from_db, \%stris Dumper(\%str_from_db), Dumper(\%str)use Data::Dumper; $Data::Dumper::Sortkeys=1;FRENCH_FRANCE.UTF8FRENCH_FRANCE.AL32UTF8

0赞 dami 1/29/2023

@Kjetil S. ：谢谢你的建议。关于字符集：你是对的，AL32UTF8推荐在 docs.oracle.com/en/database/oracle/oracle-database/19/nlspg/...（并且与我的数据库的字符集匹配）。不过，它并不能解决问题。关于 Dumper 建议：我尝试过（尽管持怀疑态度），但它无济于事：“本机”字符串不同，因为重音字符被 OCI 层替换

答：

0赞 dami 3/14/2023 #1

这是 DBD：：Oracle 的一个缺陷。已在 https://github.com/perl5-dbi/DBD-Oracle/issues/161 打开一个问题。同时，Perl 模块 https://metacpan.org/pod/DBIx::Oracle::UpgradeUtf8 可以用作解决方法：它在 DBI 级别拦截方法调用并自动升级没有 utf8 标志的 perl 字符串。

1赞 dami 7/16/2023 #2

（转载自 https://github.com/perl5-dbi/DBD-Oracle/issues/161)

我更深入地研究了这种情况，得出的结论是，即使 DBD：：Oracle 确实像 DBD：SQLite 或 DBD：:P g 那样自动升级本机字符串，它也不能完全解决问题。

Perl utf8：：upgrade 假定 iso-8859-1 字符集;因此，如果这是自动执行的，则当本机字符集不同时（例如 Windows-1252）就不方便了。因此，我编写了一个新的CPAN模块，该模块允许客户端明确指定要从以下位置解码的字符集：https://metacpan.org/pod/DBIx::AutoUpgrade::NativeStrings。

由于我不再认为 DBD：：Oracle 绝对应该进行升级，因此我将关闭该问题。

上一个：如何将 utf8 字符串编写为二进制文件

下一个：使用“use utf8;”给了我“打印中的宽字符”

如何使用 DBD：：Oracle 处理 Unicode？

How do I handle Unicode with DBD::Oracle?

评论