Performance Comparisons
From Fortis
Contents |
FORTIS REVOLUTION VS. FORTIS 3 SEGMENTATION REPORT
REPORT DRIVEN BY .RC, .DOC, AND .MIF TEST FILES
Resource file (.rc) Results
The bottom line is that Fortis 3 segments tags twice as aggressively as Fortis Revolution does when it comes to .RC files. In other words, Fortis revolution is twice as inclusive when it comes to tag segmentation.
| Not Trans. | Total Seg. | Not Trans. | Total Seg. | Not Trans. | Total Seg. | |
|---|---|---|---|---|---|---|
| Fortis 3 | 132 | 272 | 230 | 490 | 136 | 274 |
| Fortis Revolution | 135 | 135 | 247 | 247 | 136 | 136 |
| Seg: F3/FR | 2.01 | 1.98 | 2.01 |
Table 1.1. Three test cases on RC files yield an average of exactly twice as many segments in Fortis 3 as in Fortis Revolution, with nearly equal numbers of untranslated segments.
In Table 1.1 the import parameters were set to mark empty segments as pretranslated. This left the segments with translatable text untranslated, while the tag segments were marked as pretranslated. Segmentation differed so drastically since |FMT tags in Fortis 3 were counted as separate segments. This created a tag-segment in between every text-bearing segment in the file.
| Fortis Revolution | Fortis 3 |
|---|---|
| <section> Pocket PC Device Settings<p x='0' t='caption'>«1»
OK<p x='1' t='defpushbutton'>«2» Cancel<p x='2' t='pushbutton'>«3» Leave deleted e-mail in your PIM <p x='3' t='control'>«4» Help<p x='4' t='pushbutton'>«5» Private<p x='5' t='ce_task_taskprivate'>«6» First name<p x='6' t='ce_contact_firstname'>«7» Middle name<p x='7' t='ce_contact_middlename'>«8» | |FMT - Dialog#«1+»
<Coordinates:0, 0, 191, 55>«2» |FMT - Caption#«3+»¬ Pocket PC Device Settings«4» |FMT - DefPushbutton#«5+» <Coordinates:14, 35, 50, 14> OK «6» |FMT - Pushbutton#«7+» <Coordinates:70, 35, 50, 14> Cancel«8» |FMT - Control Button#«9+» <Coordinates:39, 13, 131, 10> &Leave deleted e-mail in your PIM«10» |FMT - Pushbutton#«11+» <Coordinates:126, 35, 50, 14> Help«12» |FMT - StringTable#«13+» Private«14» |FMT - StringTable#«15+» First name«16» |FMT - StringTable#«17+» Middle name«18» |FMT - StringTable#«19+» |
Microsoft Word (.doc) Results
There is still a higher segment count in Fortis 3 when using the Word filter, but not nearly as high as the .RC filter. The average is 1.13 times as many segments in Fortis 3 as in Fortis Revolution.
| Not Trans. | Total Seg. | Not Trans. | Total Seg. | Not Trans. | Total Seg. | |
|---|---|---|---|---|---|---|
| Fortis 3 | 352 | 552 | 288 | 500 | 1224 | 1841 |
| Fortis Revolution | 425 | 518 | 366 | 440 | 1438 | 1544 |
| Seg: F3/FR | 1.07 | 1.14 | 1.20 |
Table 1.2. Three .DOC file test cases show the Word Filter segmentation to be an average of 1.13 times greater in Fortis 3.
In .DOC files, the |FMT tag difference is still the main cause for the segment count disparity, although they are not so many as are in .RC files. While Fortis Revolution has fewer segments, the main cause for segmentation differences arises from the regular expression engine. The Fortis 3 regular expressions being run in the Fortis Revolution regular expression engine causes Fortis Revolution to overlook segmentation exceptions defined by the Fortis 3 regular expressions.
| Fortis Revolution | Fortis 3 |
|---|---|
| This application claims priority from US Provisional Application Serial No.«5»
60/727,472 by Boardman et al.«6» , entitled "Method of Making Light Emitting Device with Silicon-Containing Encapsulant", filed Oct. «7» 17, 2005 (61339US002).<PP "Normal" 2>«8» This application is related to: «9» commonly assigned, co-pending U.S. «10» Patent Application Ser. «11» | <F 0><Tab>This application claims priority from US Provisional Application Serial No. 60/727,472 by Boardman et al., entitled "Method of Making Light Emitting Device with Silicon-Containing Encapsulant", filed Oct. 17, 2005 (61339US002). «20»
|FMT - Text#«21+» <Tab>This application is related to: «22» commonly assigned, co-pending U.S. Patent Application Ser. «23» |
| Fortis Revolution | Fortis 3 |
|---|---|
| A variety of first catalysts are disclosed, for example, in U.S. «162»
Pat. «163» Nos. «164» 6,376,569 (Oxman et al.)«165» , 4,916,169 (Boardman et al.)«166» , 6,046,250 (Boardman et al.)«167» , 5,145,886 (Oxman et al.)«168» , 6,150,546 (Butts), 4,530,879 (Drahnak), 4,510,094 (Drahnak) 5,496,961 (Dauth), 5,523,436 (Dauth), 4,670,531 (Eckberg), as well as International Publication No. «169» WO 95/025735 (Mignani).<PP "Normal" 3>«170» | |FMT - Text#«189+»
A variety of first catalysts are disclosed, for example, in U.S. Pat. Nos. 6,376,569 (Oxman et al.), 4,916,169 (Boardman et al.), 6,046,250 (Boardman et al.), 5,145,886 (Oxman et al.), 6,150,546 (Butts), 4,530,879 (Drahnak), 4,510,094 (Drahnak) 5,496,961 (Dauth), 5,523,436 (Dauth), 4,670,531 (Eckberg), as well as International Publication No. WO 95/025735 (Mignani). «190» |FMT - Text#«191+» |
Fortis Lichida has spurious segmentation here after acronyms and abbreviations. The exceptions list that lets it know that the period after such a word is not actually a sentence boundary is not be functioning correctly.
Framemaker (.mif) Results
The Framemaker filter gives the highest segmentation disparity yet. There are several more tag-segments which cause the expansion.
| Not Trans. | Total Seg. | Not Trans. | Total Seg. | Not Trans. | Total Seg. | |
|---|---|---|---|---|---|---|
| Fortis 3 | 324 | 541 | 96 | 190 | 320 | 706 |
| Fortis Revolution | 302 | 302 | 79 | 79 | 222 | 236 |
| Seg: F3/FR | 1.79 | 2.41 | 2.99 |
The average is 2.40 more segments in Fortis 3 than in Fortis Revolution. In Fortis 3, the |FMT, |Variable, |CondM, and |Frame# tags are often tag-segments.
The segmentation of sentences can be interrupted by these tags. Tags within the text have, in some cases, caused segment breaks, which will naturally reduce the partially translated and fuzzy matched segments. The following is a prime example:
| Fortis Revolution | Fortis 3 |
|---|---|
| Errors on the <o t="var" x="Product Name"/><f x='zvb variable '/><f x='* '/> have many causes, including poor protocol writing, incorrect operator setup, variation in plates, hardware failure and software failure. <p t='bo Body' x='60'/>
<o t="marker" x="Conditional Text" v="+760911"/>It is important to understand that error handling is a normal part of operating the <o t="var" x="Product Name"/><f x='zvb variable '/><f x='* '/> and that errors usually do not mean that the <o t="var" x="Product Name"/><f x='zvb variable '/><f x='* '/> has malfunctioned. «35» Most errors are a result of operator error.<p t='bo Body'/> <p t='lip LinePrintOnly'/> <o t="marker" x="Cross-Ref" v="36879: mt MapTop: Compilation Warnings and Errors"/>Compilation Warnings and Errors<p t='mt MapTop' x='61'/> <p t='lim LineMap'/>«36» | Errors on the |Variable: "Product Name"#<F 0><F 1> have many causes, including poor protocol writing, incorrect operator setup, variation in plates, hardware failure and software failure. «108»
|FMT - Text#«109+» |CondM: "+760911"#It is important to understand that error handling is a normal part of operating the |Variable: "Product Name"#<F 0><F 1> and that errors usually do not mean that the |Variable: "Product Name"#<F 0><F 1> has malfunctioned. «110» Most errors are a result of operator error. «111» |FMT - Text#«112+» |XRefPoint: "36879: mt MapTop: Compilation Warnings and Errors"#Compilation Warnings and Errors«113» |
Fortis Lichida seems to be missing what should be some mandatory segmentation boundaries here.
Appendix A
For convenience, the Regular expressions involved have been reproduced below.
Regular Expressions governing Resource File filter segmentation in Fortis Revolution and Fortis 3:
| Fortis Revolution | Fortis 3 |
|---|---|
| <p x=.*>\s*«» | FMT –
\|FMT[!#|]+#«» |
Regular Expressions governing Word filter segmentation in Fortis Revolution and Fortis 3:
| Fortis Revolution | Fortis 3 |
|---|---|
| <(PP |TC |TR |SS
|@BR)([^">]|("([^\\"]|(\\.))*?"))*?>«» [\p{L}>/)"#][.:?!]\)*(?!\p{L})\s*«» | «»(\n+\|FMT -&#)+
\|FMT[!#|]+#(\n\|FMT[!#|]+#)*«» [A-Za-zÀ-ÿ>/)"#][.:?!]\)*(!(\s\|?"&"(#|(\|"&"#)|(, "&"#))))(![!\n\s<(|])\s*«» |
Regular Expressions governing Framemaker filter segmentation in Fortis Revolution and Fortis 3:
| Fortis Revolution | Fortis 3 |
|---|---|
| [\p{L}>/)"#][.:?!]\)*(?!\p{L})\s*«»
(<\s*/?\s*[px](\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*=\s* (('([^\']|(\\.))*')|("([^\"]|(\\.))*")))*\s*/\s*>\s*)+«» <\s*/\s*table\s*>«» | «»\n+\|FMT –
\|FMT[!#|]+#«» [A-Za-zÀ-ÿ>/)"#][.:?!]\)*(!(\s\|?"&"(#|(\|"&"#)|(,"&"#))))(![!\n\s<(|])\s*«» |
Regular Expressions governing segmentation exceptions in Fortis Revolution and Fortis 3:
| Fortis Revolution | Fortis 3 |
|---|---|
| attn\.
avg\. bldg\. bldr\.\)*\s«» chronol\.\)*\s«» Dr\. etc\.(?!\s\p{Lu}) fwd\. gds\. gra(d|ph)\. hf\. hr\. incl?\. inj\. inv\. \bpt\. (^|[\s>(\-])(ab(br*(ev)*|str)*|comp|e(lec|ncl*|ngr|tc)|fig|m(ax|i[dn]) |no|num|pt|r(ect*|el|esp*)|s(tr|ubj)|t(emp|fr)|univ|vt|wt)\.\)*\s«» ^#\s*Created on:.*$ «<\!--.*-->» «<[A-Za-z][^<>]+>» «\|((FMT - .*#)|([-A-Za-z0-9()]+:\s*".*"#)|([A-Za-z0-9()]+#))» «<Style>.*</Style>» «<Address>.*</Address>» «<Script>.*</Script>» | [\s\.]((<(!\s)[\s-;=?-~]*>)*[a-zÀ-ÿ]\.)+\)*\s«»
attn\.\)*\s«» avg\.\)*\s«» blgd\.\)*\s«» bldr\.\)*\s«» chronol\.\)*\s«» Dr\.\s*«» dz)\.\)*\s«» etc\.\s(![A-Z])«» fwd\.\)*\s«» gds\.\)*\s«» gra(d|ph)\.\)*\s«» hf\.\)*\s«» hr\.\)*\s«» incl*\.\)*\s«» inj\.\)*\s«» inv\.\)*\s«» \spt\.\)*\s«» (^|[\s>(\-])(ab(br*(ev)*|str)*|comp|e(lec|ncl*|ngr|tc)|fig|m(ax|i[dn]) |no|num|pt|r(ect*|el|esp*)|s(tr|ubj)|t(emp|fr)|univ|vt|wt)\.\)*\s«» ^#\s*Created on:&$ «<\!--&-->» «<[A-Za-z][!<>]+>» «\|((FM[tT] - &#)|([-A-Za-z0-9()]+:\s*"&"#)|([A-Za-z0-9()]+#))» «<Style>&</Style>» «<Address>&</Address>» «<Script>&</Script>» ID: |
FORTIS REVOLUTION VS. FORTIS 3 SEGMENTATION AND TAG PROTECTION CHANGES
Segmentation Rules
We are trying to make the regular expressions as inclusive as possible, instead of having different segmentation rules for all of the main languages. The segmentation rules have been tuned to the following:
Word 2003, by sentences
<(PP |TC |TR |SS |@BR)([^">]|("([^\\"]|(\\.))*?"))*?>«»
[\p{L}>/)"#][.:。?!]\)*(?![\p{L}".])\s*«»
Framemaker, by sentences
[\p{L}>/)"#][.:?!]\)*(?!\p{L})\s*«»
(<\s*/?\s*[px](\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*=\s*(('([^\']|(\\.))*')|("([^\"]|(\\.))*")))*\s*/\s*>\s*)+«»
<\s*/\s*table\s*>«»
RC
<p x=.*>\s*«»
[\p{L}>/)"#][.:?!]\)*(?![\p{L}/)<>"#])\s*«»
Tag Protection
Word 2003:
This has not changed.
(\\.)|(<([^"<>]|("([^\\]|(\\.))*?"))*?>)|(\[.*?\])
Framemaker Filter:
This will protect tags as yet ignored by Framemaker.
(<\s*/*\s*[obtfpx]r*(\s+([a-zA-Z0-9_][a-zA-Z0-9_]*)\s*=\s*(('([^\']|(\\.))*')|("([^\"]|(\\.))*")))*\s*/\s*>)+
</*table.*>
RC Filter:
In order to protect string table constants, I added three expressions to the RC filter tag protection settings:
<section>
<p ([ a-zA-Z0-9]+='.*?')+>
(\\[rtn])+
%(l*[fcsdx])+
%[0-9]+
Segmentation Exceptions
The Segmentation Exceptions have been rewritten. The expressions are long, but clear, in an effort to make their editing more manageable. Abbreviations are listed by language, in alphabetical order, and are as follows:
English:
\s(abbr|abbrev|acad|alt|AM|A\.M|AD|A\.D|apr|apt|assn|attn|aug|ave|avg|AWOL|B\.A|B\.C|B\.S|bldg|bldr|blvd|capt|chronol?|cm|cu|ctr)\.
\s(cent|ca?|co|col|cpl|comp|corp|ct|dec|dept|DC|D\.C|deg|dr|div|ed|\(?e\.g|elec|encl|engr|et\.?\s*al|etc|feb|fig|FM|ft|fwd)\.
\s(gal|gen|GMT|gov|grad|graph|hr|hwy|ibid|\(?i\.e|in|inj|incl?|inst|inv|jan|jr|kg|km|lk|ln|lib|lat|lt|ltd|long|M\.A|M\.H|mar)\.
\s(max|MD|mfg|mi|mid|min|m\.d|mph|mm|mg|mr|msgr|mo|mt|mus|nov|no|nos|nov|num|oct|oz|p+|pat|Ph\.D|pl|P\.O|pop|p\.m|prof|pt|qt)\.
\s(rd|rel|resp|rev|r\.n|rpm|sec|sept|sgt|sq|sr|sta|st|ste|str|subj|sun|temp|ter|tpk|univ|U\.S\.?A?|vol|vs|wt|yd)\.
French:
\s(agr|ap|appt|arr|art|at|av|bd|br|bt|b\.lat|cfr|chap|cit|cm|déf|dépt|div|dyn|él|élec|élect)\. \s(électr|électron|etc|exp|établt|fig|frs|hab|ib(id)*|max|min|no|ob|pos|p\.\s*ex|qqc|qqn|réf|rép|resp|sec|soc)\.
German:
\s(abb|abh|allg|autom|adm|art|bez|bzw|ca|chf|div|dyn|dat|D\.h|dt|elekt?r?|etc|evtl|eig|ehem)\. \s(fa|fzg|freig|getr|ggfs?|gr|gereg|ges|hinw|hydr|hsp|ind|incl|kl|kpm|kompl|kontr|)\. \s(max?(sch)?|mech|mechan|mand|mind|mod|mol|nr|nl|nschl|nsp|pkt|pos|prof|spät|spr|sec|sek|schr|std|str|spez)\. \s(temp|tel|ttl|tzl|u\.a|usw|ve?rz|ve?bl|vgl|vgl\.\s*z\.\s*B|vol|z\.\s*B|zchng|zul|zw|zyl|z\.?t|amp|ant|el|eing|fig|ger|ges|ltr?|od|st(at)?)\.
Dutch:
The Dutch exceptions seem a bit out of hand. Any expertise in this area that would help to weed out eroneous abbreviations, or verify useful ones would be welcome.
\s(acad|acc|adj|adm|afb|afd|afk|afl|afz|al|alg|alt|apr|arr|art|asp|atm|betr|bijl|bijz)\.
\s(blz|br|burg|ca|cand|cap|cat|cf|chr|cie|cod|coöp|ct|dag|dat|deb|dec|del|dept|derg|dgl|dir|distr|div|do|dors?|dw)\.
\s(e\.d|eerw|eig|em|eng|enk|enz|evt?|ex|exc|excl|fa|febr|fec|fig|fl|fol|fr|ge|geb|gebr|geh|gem|gen|gep|gesch|get|gez|gld|gr|gymn)\.
\s(hd|herv|hh|hoogl|hr|hs|ib|ibid|id|impr?|incl|inf|ing|inl|insp|inte|inz|ir|isr|it|jan|jg|jhr|jkvr|jl|jr)\.
\s(kan|kand|kap|kapt|kar|kath|kl|kon|lat|lb|li|lib|lic|ll|log|lt|mad|maj|max|med\.\sdrs|mej|mevr|mgr|mij|mil|min|mld|mln|mr|ms|muz|mv|mw)\.
\s(nat|ndl?|ned|nl|nom?|nr|ob|obl|okt|ong|onz|op|op\.?\s?cit|opm|o\.m|opp|pag|par|pct|pd|perf|pers|plm?|praes|pres|pro|proc|prop|prot|ps)\.
\s(red|reg|resp|rom|schr|sc|sec|sept|spr|sq|sr|st|stb|stct|st\.-gen|subst?|tab|td|tel|temp|test|tgov|tit|trim|tw|U\.S\.?A?)\.
\s(vac|val|vdt|verb|verg|versch|vert|vgl|vid|vlg?|vnl|vnw|vo+lg|vo+rl|vo+rm|vo+rw|vo+rz|vr|vr\.pr|vs|vz|wd|wed|weled|weledel|weledelgeb)\.
\s(weledelgestr|wleerw|wsch|z\.br|z\.em|z\.exc|zgn?|zpg|zr)\.
\s(a\.(a\.)?(u\.)?|a\.b|a\.c|a\.d|a\.e|a\.g|a\.h\.d|a\.h\.w|a\.i|a\.p|a\.s|a\.u\.b|a\.v|a\.w|b\.b\.h\.h|b\.d|b\.\sen\sw|b\.g*|b\.i|b\.lo?b\.v|b\.w)\.
\s(c\.a\.?o?|c\.if?|c\.l|c\.o|c\.q|c\.s|d\.a\.v|d\.d|d\.i|d\.m\.v|d\.p|d\.t\.p|d\.v|d\.w\.z|e\.a|e\.c\.g?|e\.d|e\.(e\.)?g|e\.i|e\.k|e\.o|e\.p|e\.v\.a?)\.
\s(f\.a\.q|f\.d\.c|f\.o\.b|f\.o\.r|g\.g\.d|g\.o\|g\.t|g\.v\.d|h\.a|h\.h|h\.b\.b\.h\.h|h\.c|h\.d\.s|h\.e|h\.i|h\.k\.h|h\.m|h\.o|h\.o\.k\.t|h\.o\.l\.t|h\.r|
\s(h\.s|h\.t|h\.t\.l|i\.a\.a|i\.b\.d\.|i\.b\.v|i\.c|i\.e|i\.g\.st|i\.g\.z|i\.h\.a|i\.h\.b|i\.m|i\.s\.m|i\.o|i\.p\.v|i\.t\.t|i\.v|i\.v\.m|i\.v\.o|i\.z\.g\.st|J\.C)\.
\s(k\.g\.v|k\.k|k\.o|l\.b|l\.b\.o|l\.c|l\.g|m\.a\.w|m\.b\.t|m\.b\.v|m\.d|m\.g|m\.i|m\.v|m\.e\.t|m\.h\.d|m\.h\.g|m\.m|m\.m\.l|m\.m\.k|m\.v|m\.n|m\.o|m\.o\.b)\.
\s(m\.u\.v|m\.z|n\.a|n\.a\.g|n\.a\.v|n\.Chr|n\.h|n\.l|n\.m|n\.m\.m|n\.n|n\.b|n\.br|n\.n\.o|n\.n\.w|n\.o|n\.o\.m|n\.o\.t\.k|n\.s|n\.t|n\.v\.t|n\.w)\.
\s(o\.a|o\.b|o\.c|o\.e\.r|o\.g|o\.i|o\.i\.d|o\.i\.o|o\.k|o\.l\.v|o\.o|o\.o\.v|o\.r|o\.r\.t|o\.v\.v|o\.w|p\.a|p\.c|p\.d|p\.e|p\.f|p\.j|p\.m|p\.o|p\.p|p\.p\.d|p\.r|p\.w)\.
\s(r\.f\.s\.v\.p|r\.s\.v\.p|r\.i\.p|s\.g|s\.h|s\.m|s\.s|s\.s\.t\.t|s\.t|s\.v|s\.v\.p|t\.a\.n|t\.a\.p|t\.a\.v|t\.b|t\.b\.c|t\.b\.v|t\.d|t\.d\.e|t\.e\.a\.b|t\.g\.t|t\.g\.v)\.
\s(t\.h|t\.h\.t|t\.k\.a|t\.l\.v|t\.n\.v|t\.o|t\.o\.v|t\.w|t\.w\.z|t\.z|t\.z\.p|t\.z\.t|t\.z\.v|v\.a|v\.a\.g\.v|a\.b|v\.c|v\.chr|v\.d|v\.d\.e\.n|v\.d\.j|v\.d\.s)\.
\s(v\.g\.a|v\.g\.g\.v|v\.g\.h|v\.h|v\.h\.t\.h|v\.i\.o|v\.j|v\.k|v\.k\.a|v\.l\.n\.r|v\.l\.o|v\.m|v\.o|v\.o\.n|v\.r\.n\.l|v\.v)\.
\s(w\.g|w\.i|w\.l|w\.o|w\.vl|w\.v\.s|w\.v\.s\.tr|w\.v\.s\.tr|w\.v\.t\.t\.k|z\.a|z\.b|z\.b\.b\.h\.h|z\.d|z\.d\.h|z\.e|z\.g|z\.g\.a\.n|z\.h)\.
\s(z\.h\.s|z\.i|z\.j|z\.k|z\.k\.h|z\.k\.m|z\.o|z\.o\.z|z\.s\.m|z\.t|z\.z\.g\.g|z\.z\.o|z\.z\.w)\.
Additions, or improving modifications are always welcome.