ALTO Version 4.0 has been officially released You can find the 4.0 schema at Library of Congress (https://www.loc.gov/standards/alto/) and on Github (https://github.com/altoxml/schema/blob/master/v4/alto-4-0-draft.xsd.
Summary of changes for version 4.0
- Changed schema version to 4.0
- Changed namespace and targetNamespace to http://www.loc.gov/standards/alto/ns-v4#
- Clarification and definition of the licensing to common standard "CC BY-SA 4.0" for this ALTO standard (with agreement of the authors)
- Added character based text description with new Glyph element and its subelement Variant (GlyphType, VariantType)
- Extended annotation for clarification of the difference of existing element ALTERNATIVE and Glyph/Variant
- Introduced generic "Processing" and deprecate "OcrProcessing"
- Introduce generic "processingStep" with "ProcessingStepType" and required attribute "ID" and deprecate "preProcessingStep", "ocrProcessingStep", "postProcessingStep"
- Add common vocabulary for "processingStep" comprising the "ContentGeneration", "ContentModification", "PreOperation", "PostOperation", "Other"
- Fix for the element Shape. The Shape element can now only be used once within a PageSpace or a TextLine as it was intended.
Comments about the schema and its documentation as well as additional use cases for the new schema features are encouraged (GitHub account required). See https://github.com/altoxml/schema/issues .
See use-cases for the use of glyphs at https://github.com/altoxml/documentation/tree/master/v4/GlyphSamples .
ALTO schemas will be updated by whole numbers upon making changes that break backward compatibility (version 1 to version 2), and decimals for changes that will not (3.0 to 3.1). The namespace itself will also only change on major versions (ns-v2 to ns-v3).
ALTO Editorial Board