Print

Print


Possibly -- no, almost certainly :) -- of interest to this list.

Michele
*******
Michele Combs
Lead Archivist, Special Collections Research Center
Syracuse University
315-443-2081
[log in to unmask]

________________________________________
From: Steven D Majewski [log in to unmask] <[log in to unmask]>
Sent: Monday, June 20, 2016 4:18 PM
To: [log in to unmask]
Subject: Word to other XML conversion. [ Re: [xsl] where to look for xsl        folk..]

We have an application that was used to interactively convert Word document finding aids into EAD XML.

  https://github.com/uvalib/transmog

and I believe it can be adapted to convert to TEI XML instead.


The templates here are a set of rules that use regular expressions on the headings to guess what XML elements those paragraphs should be assigned to, and it looks like it could probably be reconfigured to output TEI instead of EAD.

  https://github.com/uvalib/transmog/tree/master/src/main/resources


The webapp display those guesses and allows you to rearrange or reassign those assignments.
So it doesn’t solve the problem of writing XSLT conversion rules, but it does help with conversion of documents that may not exactly follow those rules.

Typically the converted documents still require some manual QA and editing.


— Steve Majewski / UVA Alderman Library





> On Jun 20, 2016, at 3:30 PM, G. Ken Holman [log in to unmask] <[log in to unmask]> wrote:
>
> Indeed hard does not mean impossible.  The Inera folks have a strong product named eXtyles for going from Word to various JATS derivatives including ISOSTS that I am personally interested in:
>
>  http://www.inera.com/resources/extyles-related-technologies
>
> I haven't heard much of any other Word-based products ... but I post this to point out that it has been done successfully commercially.
>
> . . . . . . . Ken
>
> At 2016-06-20 18:58 +0000, Wendell Piez [log in to unmask] wrote:
>
>> Hi,
>>
>> On Mon, Jun 20, 2016 at 10:36 AM, Christopher R. Maden [log in to unmask]
>> <[log in to unmask]> wrote:
>> > On 06/19/2016 04:17 PM, adam [log in to unmask] wrote:
>> >>
>> >> We are working with docx files that need to be translated into HTML. The
>> >> docx files are chapters of scholarly content that constitute a book. We
>> >> need to translate the docx into a tidy HTML version with direct
>> >> translation of semantic elements but with the elimination of styles.
>> >
>> > There are a few tools to do this kind of thing.  The Public Knowledge
>> > Project is working on integrating them into a pipeline; it's not ready for
>> > prime time *quite* yet, but it's getting there, and the individual
>> > components may be useful to you on their own.  Check out <URL:
>> > https://github.com/pkp/xmlps > for source and more info.
>>
>> Indeed there are a number of different such initiatives some of them
>> including XSLT and so on topic. :-)
>>
>> (In fact didn't Eliot recently mention his thing for a Word -> DITA pathway?)
>>
>> Whether using XSLT (and on topic) or not -- converting from Word (what
>> I like to call a 'paintbrush' application) into strong markup is going
>> to be a hard problem, largely because its boundaries are not in an
>> obvious place, plus they move. It will always be contested what is in
>> scope vs what is not, and there will be a tradeoff between generic and
>> specialized capabilities.
>>
>> Hard doesn't mean impossible, however, and what would be nice would be
>> a toolkit that could be adapted for local use....
>>
>> Cheers, Wendell
>>
>> --
>> Wendell Piez | http://www.wendellpiez.com
>> XML | XSLT | electronic publishing
>> Eat Your Vegetables
>> _____oo_________o_o___ooooo____ooooooo_^
>>
>
>
> --
> Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
> Streaming hands-on XSLT/XPath 2 training @US$45: http://goo.gl/Dd9qBK |
> Crane Softwrights Ltd. _ _ _ _ _ _ http://www.CraneSoftwrights.com/s/ |
> G Ken Holman _ _ _ _ _ _ _ _ _ _ mailto:[log in to unmask] |
> Google+ blog _ _ _ _ _ http://plus.google.com/+GKenHolman-Crane/posts |
> Legal business disclaimers: _ _ http://www.CraneSoftwrights.com/legal |
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1127818
or by email: [log in to unmask]
--~--