-
Defect
-
Resolution: Done
-
Critical
-
1.0.0-RC2
-
None
-
None
Hi Luca, Nate,
I am testing the RC2 / esg.ucar.edu site. I've found something strange relating to dataset_version and logical_file. It appears that each logical file is reflected twice in each dataset version in general. I'm not sure how or when this may have occurred. Note that this is not the case on the esg.prototype site nor at pcmdi.
From what I see, most logical_file, dataset_version tuples exist twice in the logical_file_dataset_version_xref table. Perhaps this is the result of a bug in the harvesting process or in the data conversion (or something else?)
The result is almost all datasets appear to contain twice the correct number of files, which is a problem we need to correct.
From what I can tell, this is the case for ~95% of the logical files in the system. All datasets published prior to 2/17/2010 exhibit this problem. (No datasets published since yesterday exhibit the problem, tho.)
I cannot reproduce this problem by publishing new datasets to the prototype.
Most likely we need to "fix" this and remove all duplicate rows in logical_file_dataset_version_xref, definitely prior to announcing external access to esg.ucar.edu. Adding a unique constraint to (logical_file_id, dataset_version_id) sounds good, too (I think there is already an index on these two cols.)
I was reviewing the liquibase updates applied to the prod esg database and I'm a little confused about what all has been applied. It could be that something went wrong during a liquibase update and caused this problem - though this is unclear.
-Eric
I am testing the RC2 / esg.ucar.edu site. I've found something strange relating to dataset_version and logical_file. It appears that each logical file is reflected twice in each dataset version in general. I'm not sure how or when this may have occurred. Note that this is not the case on the esg.prototype site nor at pcmdi.
From what I see, most logical_file, dataset_version tuples exist twice in the logical_file_dataset_version_xref table. Perhaps this is the result of a bug in the harvesting process or in the data conversion (or something else?)
The result is almost all datasets appear to contain twice the correct number of files, which is a problem we need to correct.
From what I can tell, this is the case for ~95% of the logical files in the system. All datasets published prior to 2/17/2010 exhibit this problem. (No datasets published since yesterday exhibit the problem, tho.)
I cannot reproduce this problem by publishing new datasets to the prototype.
Most likely we need to "fix" this and remove all duplicate rows in logical_file_dataset_version_xref, definitely prior to announcing external access to esg.ucar.edu. Adding a unique constraint to (logical_file_id, dataset_version_id) sounds good, too (I think there is already an index on these two cols.)
I was reviewing the liquibase updates applied to the prod esg database and I'm a little confused about what all has been applied. It could be that something went wrong during a liquibase update and caused this problem - though this is unclear.
-Eric