Sometimes you get some XLS schemas that are full of additional annotations and comments you want strip.
Annotations and comments are nice but annoying when trying to compare two incremental versions of some schema and try to pin point just the differences.
Bellow there is a script used to remove all comments and the <xs:annotation> block from an xsd file.
A sample of the schema is given bellow:
<!-- AccountIdentification4Choice1 -->
<xs:complexType name="AccountIdentification4Choice1">
<xs:annotation>
<xs:documentation source="Name" xml:lang="EN">AccountIdentification4Choice__1</xs:documentation>
<xs:documentation source="Definition" xml:lang="EN">Specifies the unique identification of an account as assigned by the account servicer.</xs:documentation>
</xs:annotation>
<xs:choice>
<xs:element name="IBAN" type="IBAN2007Identifier">
<xs:annotation>
<xs:documentation source="Name" xml:lang="EN">IBAN</xs:documentation>
<xs:documentation source="Definition" xml:lang="EN">International Bank Account Number (IBAN) - identifier used internationally by financial institutions to uniquely identify the account of a customer. Further specifications of the format and content of
the IBAN can be found in the standard ISO 13616 "Banking and related financial services - International Bank Account Number (IBAN)" version 1997-10-01, or later revisions.</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element name="Othr" type="GenericAccountIdentification11">
<xs:annotation>
<xs:documentation source="Name" xml:lang="EN">Other</xs:documentation>
<xs:documentation source="Definition" xml:lang="EN">Unique identification of an account, as assigned by the account servicer, using an identification scheme.</xs:documentation>
</xs:annotation>
</xs:element>
</xs:choice>
</xs:complexType>
The following script will do the clean-up for all the xsd schema files from the current directory.
#!/bin/bash
for file in *.xsd
do
xmlstarlet ed -P -d "//*/xs:annotation" $file \
| sed 's/<!--/\x0<!--/g;s/-->/-->\x0/g' \
| grep -zv '^<!--' \
| tr -d '\0' \
| sed '/^[[:space:]]*$/d' \
>> ${file%".xsd"}-clean.xsd
done
Where :
xmlstarlet ed -P -d "//*/xs:annotation" $file \
Parses the xsd/xml and removes all occurrences of a given block , <xs:annotation> block in my case
| sed 's/<!--/\x0<!--/g;s/-->/-->\x0/g' \
| grep -zv '^<!--' \
| tr -d '\0' \
Removes all the comment blocks from the XSD/XML
| sed '/^[[:space:]]*$/d' \
Removes all empty lines containing just spaces. This are some artefacts left behind by the previous two remove operations
>> ${file%".xsd"}-clean.xsd
Dumps the result to a new file. If initial file was myschema1.xsd the result file will be myschema1-clean.xsd
Resources:
All the commands are standard bash command line commands like sed, grep, tr that can be found on other flavours of Unix systems not just Linux.
The only special additional software is the wonderful open source tool XMLStarlet that has releases for Linux, Windows and Solaris.
Basically the above scripts can be applied with small changes to any other supported OS.