提问人:Pooja 提问时间:1/8/2023 更新时间:1/9/2023 访问量:234
如何使用 Java 8 或更高版本逐个字段比较自定义对象的列表/映射,以通用方式为非常大的数据集创建不匹配报告?
How to compare the list/map of custom objects, field by field to create mismatch report for very big data set in generic way using Java 8 or more?
问:
我一直在研究 Java 中 2 个不同数据库源之间的数据比较。由于其他一些挑战,我无法直接在数据库中进行比较。
- 我有 50 张桌子要比较。
- 表数从 10k 到 500k 不等。 (需要高效的算法)
- 每个表的列数和字段名称也会有所不同(当然)
我使用for循环编写了以下代码,这是限制,例如:
- 由于某些表的数据量可能很大,因此 for 循环解决方案效率不高。
- 每个表的列数会有所不同,因此我编写的逻辑不适用于所有人,我需要对不同的表重复它。大量的样板代码。
- 假设任何新列被添加到某个表中,比较逻辑也需要更新
我的要求:
- 我想编写一个有效的代码,用于查找提供的自定义对象列表的逐个字段的不匹配报告。
- 比较代码应该能够比较任何类型的自定义对象列表。(不知道该怎么做)
- 能够通过引用一些属性文件来创建表对象 POJO,该属性文件将包含所有表的列列表。
public void loadDummyTableObjects() {
table1DataList =
Arrays.asList(new TestTable1("1","1","One","Blue"),
new TestTable1("2","2","Two","Red"),
new TestTable1("3","3","Three","Black"),
new TestTable1("4","4","Four","Green"),
new TestTable1("5","5","Five","White"));
table2DataList =
Arrays.asList(new TestTable2("1","1","One","Blue"),
new TestTable2("2","2","Two","Red1"),
new TestTable2("3","3","Three","Black"),
new TestTable2("4","4","Four","Green"),
new TestTable2("5","5","Two","White"));
}
public void compareDataWithForLoop() {
loadDummyTableObjects();
List<MismatchReport> mismatchReport = new ArrayList<>();
for (TestTable1 t1Row: table1DataList) {
for (TestTable2 t2Row: table2DataList) {
if (t1Row.getId().equals(t2Row.getId())) {
if (!(t1Row.getColumn1().equals(t2Row.getColumn1()))) {
MismatchReport result = getMismatchReport("Table1", "Column1", t1Row.getColumn1(), t2Row.getColumn1());
mismatchReport.add(result);
}
if (!(t1Row.getColumn2().equals(t2Row.getColumn2()))) {
MismatchReport result = getMismatchReport("Table1", "Column2", t1Row.getColumn2(), t2Row.getColumn2());
mismatchReport.add(result);
}
if (!(t1Row.getColumn3().equals(t2Row.getColumn3()))) {
MismatchReport result = getMismatchReport("Table1", "Column3", t1Row.getColumn3(), t2Row.getColumn3());
mismatchReport.add(result);
}
}
}
}
System.out.println(mismatchReport);
}
private static MismatchReport getMismatchReport(String tableNme, String Db1Table1Column1, String t1Row, String t2Row) {
MismatchReport result = new MismatchReport();
result.setTableNme(tableNme);
result.setColumnNme(Db1Table1Column1);
result.setDb1Value(t1Row);
result.setDb2Value(t2Row);
return result;
}
public static void main(String[] args) {
DataComparatorService service = new DataComparatorService();
service.compareDataWithForLoop();
}
每个表比较的输出格式应相同。结果应包含字段(TableName、ColumnName、Db1Value、Db2Value),以了解发现差异的列和不匹配值。 以上代码的输出为:
[MismatchReport{tableNme='Table1', columnNme='Column3', db1Value='Red', db2Value='Red1'},
MismatchReport{tableNme='Table1', columnNme='Column2', db1Value='Five', db2Value='Two'}]
任何关于如何实现上述要求的线索都将非常有帮助。
答:
1赞
Eritrean
1/8/2023
#1
如果我是你,我不会重新发明轮子,而是会使用第三方库,如JaVers。
它是一个功能强大而轻量级的库。它可以做更多的事情,但你也可以把它作为一个纯粹的对象差异工具。作为起点,我采用了您的一些示例输入来展示如何将其应用于您的用例。
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Collectors;
import org.javers.core.Javers;
import org.javers.core.JaversBuilder;
import org.javers.core.diff.Diff;
import lombok.AllArgsConstructor;
import lombok.Getter;
public final class Example {
public static void main(String[] args) {
//Just copied your sample input but used only one custom class as the second is not really needed
List<TestTable1> dataDB1 = Arrays.asList(new TestTable1("1","1","One","Blue"),
new TestTable1("2","2","Two","Red"),
new TestTable1("3","3","Three","Black"),
new TestTable1("4","4","Four","Green"),
new TestTable1("5","5","Five","White"));
List<TestTable1> dataDB2 = Arrays.asList(new TestTable1("1","1","One","Blue"),
new TestTable1("2","2","Two","Red1"),
new TestTable1("3","3","Three","Black"),
new TestTable1("4","4","Four","Green"),
new TestTable1("5","5","Two","White"));
//create a map from your input for a faster access of objects by id
Map<String, TestTable1> db1Map = dataDB1.stream()
.collect(Collectors.toMap(TestTable1::getId, Function.identity()));
Map<String, TestTable1> db2Map = dataDB2.stream()
.collect(Collectors.toMap(TestTable1::getId, Function.identity()));
// do your comparison using JaVers
Javers javers = JaversBuilder.javers().build();
db1Map.keySet().forEach(key -> {
Diff diff = javers.compare(db1Map.get(key), db2Map.get(key));
if (diff.hasChanges()){
System.out.println("Changes for id: " + key);
System.out.println(diff.prettyPrint());
System.out.println("********************************************************");
System.out.println();
}
});
}
// a simple POJO for your data
@AllArgsConstructor
@Getter
public static class TestTable1 {
String id;
String column1;
String column2;
String column3;
}
}
输出:
Changes for id: 2
Diff:
* changes on com.mycompany.Example$TestTable1/ :
- 'column3' changed: 'Red' -> 'Red1'
********************************************************
Changes for id: 5
Diff:
* changes on com.mycompany.Example$TestTable1/ :
- 'column2' changed: 'Five' -> 'Two'
********************************************************
我只是曾经得到一个标准输出,但你可以配置它以满足你的需求prettyPrint
评论