I have a ton of Oracle Forms XML export files and wanted to know, which different patterns occur for the value of the FormatMask XML attribute. The input looks as follows:
<Item Name="CREATION_DATE" UpdateAllowed="false" DirtyInfo="false" Visible="false" QueryAllowed="false" InsertAllowed="false" Comment="TABLE ALIAS&#10; FDA&#10;&#10;BASED ON TABLE&#10; TMI_FINANCIAL_DATA&#10;&#10;COLUMN USAGES&#10; ... CREATION_DATE SEL&#10;" ParentModule="OBJLIB1" Width="10" Required="false" ColumnName="CREATION_DATE" DataType="Date" ParentModuleType="25" Label="Creation Date" ParentType="15" ParentName="QMSSO$QUERY_ONLY_ITEM" MaximumLength="10" PersistentClientInfoLength="142" ParentFilename="tmiolb65_mla.olb" FormatMask="DD-MM-RRRR">
A naive grep command would print out the whole line, including the file name. After some iterations I came to the following command, which does what I want in a single line.
grep -R -h -o -e FormatMask=\"[^\"]* * | sed 's/FormatMask="//g' | sort | uniq
What the command does is:
- grep recursively (-R) for a regular expression (-e)
- search for FormatMask="<any-char-until-quotation>
- print only the matching part of the line (-o). This will include the prefix FormatMask="
- print without the file name (-h)
- strip off the prefix with sed
- sort the results alphabetically
- remove duplicate lines (uniq)
The result (excerpt)is:
00
09
099
0999
0999999
09999999
0D0
0D999
0D9999
9
90
90D0
90D000
...