A New Automated Deposit Workflow for OAPEN and DOAB: More Efficient, Faster, and Robust

Read this article at

We are excited to share a significant update on the progress of our technical strategy. In direct response to the key findings from the Technical Review of the OAPEN Library and the Directory of Open Access Books (DOAB) by the Curtin Institute for Data Science, alongside our internal audit, we are moving from analysis to concrete implementation.

These changes, initiated and driven forward by the now fully staffed OAPEN and DOAB technology team, are a concrete step in addressing the report’s core recommendations around scalability, automation, auditability, and the reduction of manual metadata handling.

From semi-automated to fully automated

Our semi-automated workflow (involving manual file discovery and handling, automated processing, and manual validation) will be replaced by a fully automated pipeline. This new system comprehensively handles file discovery, management, transformation, and validation end-to-end. While the list of changes may look simple, it is the result of months rethinking how we approach metadata deposit, transformation, and quality control for greater sustainability and robustness.

Tangible impact so far:

Increased throughput: From processing ~10-100 books per day to hundreds, or even thousands per day (when incoming metadata is correct, and testing is complete).

Major efficiency gain: Processing up to 10 times faster compared to the previous workflow.

Key technical improvements:

Migration from Virtual Basic for applications to Python: For better long-term maintainability, enhanced testing, and easier integration with internal and external services.

Early validation and error reporting: Identifying critical metadata gaps sooner and notifying publishers faster.

Expanded ONIX coverage: Includes legacy and non-standard field locations used by older or variant ONIX feeds.

Automatic record separation: Records automatically sorted into those that are valid and ready to publish, and those requiring metadata improvement, significantly speeding up the review process.

Improved auditability: Detailed run summaries and structured reporting of missing data and issues.

What we’re working on now:

Further hardening the metadata transformation to reduce edge-case failures.

Developing an improved metadata schema better aligned with library and community standards.

Updating publisher guidelines to reflect our changes and to make critical fields mandatory.

Close coordination with publishers to improve data quality, consistency, and alignment.

What this means for publishers

To support publishers through this transition, we will be sharing updated guidance on the required and recommended metadata fields, along with practical examples and common issues to avoid. The new workflow is designed to make deposit faster and more predictable when metadata is complete and to provide earlier, clearer feedback when critical elements are missing, so issues can be corrected quickly and with less back and forth. If records are flagged as needing improvement, publishers will receive structured information on what needs attention, helping them resolve problems efficiently and resubmit with confidence. We will also proactively contact affected publishers where we see recurring issues or where changes in requirements may have an impact, with the goal of ensuring continuity of service and minimizing disruption.

We want to emphasise that the full potential of this new system can only be realised with clean, complete metadata.

Meet the team behind the changes

This strategic shift is powered by the dedication of our technology team:

Dr. Anna Wałek | Head of Technology

Leads technology development and infrastructure strategy, driving the implementation of our technical roadmap.

Wiktor Florian | Metadata & Systems Specialist

Lead engineer for the core automation project, leveraging his data engineering background.

Sînziana Păltineanu | Metadata & Systems Specialist

Provides subject matter expertise and ensures metadata quality and standards.

Hanna Varachkina | Metadata & Systems Specialist

Coordinates publisher communications and analyses common metadata anomalies.

Vaggelis Theodorakopoulos | Platform Engineer

Responsible for DSpace development and migration coordination to new systems.

We will continue to share updates as our new workflow matures. Thank you for your continued support of open access and metadata quality. If you have any questions about these changes, please contact us at [email protected].