FortisMain Page | About | Help | FAQ | Special pages | Log in

Performance Comparisons

From Fortis

Contents

FORTIS REVOLUTION VS. FORTIS 3 SEGMENTATION REPORT

REPORT DRIVEN BY .RC, .DOC, AND .MIF TEST FILES

Resource file (.rc) Results

The bottom line is that Fortis 3 segments tags twice as aggressively as Fortis Revolution does when it comes to .RC files. In other words, Fortis revolution is twice as inclusive when it comes to tag segmentation.

Not Trans. Total Seg. Not Trans. Total Seg. Not Trans. Total Seg.
Fortis 3 132 272 230 490 136 274
Fortis Revolution 135 135 247 247 136 136
Seg: F3/FR 2.01 1.98 2.01

Table 1.1. Three test cases on RC files yield an average of exactly twice as many segments in Fortis 3 as in Fortis Revolution, with nearly equal numbers of untranslated segments.

In Table 1.1 the import parameters were set to mark empty segments as pretranslated. This left the segments with translatable text untranslated, while the tag segments were marked as pretranslated. Segmentation differed so drastically since |FMT tags in Fortis 3 were counted as separate segments. This created a tag-segment in between every text-bearing segment in the file.


Fortis Revolution Fortis 3
<section> Pocket PC Device Settings<p x='0' t='caption'>«1»

OK<p x='1' t='defpushbutton'>«2»

Cancel<p x='2' t='pushbutton'>«3»

Leave deleted e-mail in your PIM <p x='3' t='control'>«4»

Help<p x='4' t='pushbutton'>«5»

Private<p x='5' t='ce_task_taskprivate'>«6»

First name<p x='6' t='ce_contact_firstname'>«7»

Middle name<p x='7' t='ce_contact_middlename'>«8»

|FMT - Dialog#«1+»

<Coordinates:0, 0, 191, 55>«2»

|FMT - Caption#«3+»¬

Pocket PC Device Settings«4»

|FMT - DefPushbutton#«5+»

<Coordinates:14, 35, 50, 14> OK «6»

|FMT - Pushbutton#«7+»

<Coordinates:70, 35, 50, 14> Cancel«8»

|FMT - Control Button#«9+»

<Coordinates:39, 13, 131, 10> &Leave deleted e-mail in your PIM«10»

|FMT - Pushbutton#«11+»

<Coordinates:126, 35, 50, 14> Help«12»

|FMT - StringTable#«13+»

Private«14»

|FMT - StringTable#«15+»

First name«16»

|FMT - StringTable#«17+»

Middle name«18»

|FMT - StringTable#«19+»

Microsoft Word (.doc) Results

There is still a higher segment count in Fortis 3 when using the Word filter, but not nearly as high as the .RC filter. The average is 1.13 times as many segments in Fortis 3 as in Fortis Revolution.


Not Trans. Total Seg. Not Trans. Total Seg. Not Trans. Total Seg.
Fortis 3 352 552 288 500 1224 1841
Fortis Revolution 425 518 366 440 1438 1544
Seg: F3/FR 1.07 1.14 1.20

Table 1.2. Three .DOC file test cases show the Word Filter segmentation to be an average of 1.13 times greater in Fortis 3.

In .DOC files, the |FMT tag difference is still the main cause for the segment count disparity, although they are not so many as are in .RC files. While Fortis Revolution has fewer segments, the main cause for segmentation differences arises from the regular expression engine. The Fortis 3 regular expressions being run in the Fortis Revolution regular expression engine causes Fortis Revolution to overlook segmentation exceptions defined by the Fortis 3 regular expressions.


Fortis Revolution Fortis 3
This application claims priority from US Provisional Application Serial No.«5»

60/727,472 by Boardman et al.«6»

, entitled "Method of Making Light Emitting Device with Silicon-Containing Encapsulant", filed Oct. «7»

17, 2005 (61339US002).<PP "Normal" 2>«8»

This application is related to: «9»

commonly assigned, co-pending U.S. «10»

Patent Application Ser. «11»

<F 0><Tab>This application claims priority from US Provisional Application Serial No. 60/727,472 by Boardman et al., entitled "Method of Making Light Emitting Device with Silicon-Containing Encapsulant", filed Oct. 17, 2005 (61339US002). «20»

|FMT - Text#«21+»

<Tab>This application is related to: «22»

commonly assigned, co-pending U.S. Patent Application Ser. «23»


Fortis Revolution Fortis 3
A variety of first catalysts are disclosed, for example, in U.S. «162»

Pat. «163»

Nos. «164»

6,376,569 (Oxman et al.)«165»

, 4,916,169 (Boardman et al.)«166»

, 6,046,250 (Boardman et al.)«167»

, 5,145,886 (Oxman et al.)«168»

, 6,150,546 (Butts), 4,530,879 (Drahnak), 4,510,094 (Drahnak) 5,496,961 (Dauth), 5,523,436 (Dauth), 4,670,531 (Eckberg), as well as International Publication No. «169»

WO 95/025735 (Mignani).<PP "Normal" 3>«170»

|FMT - Text#«189+»

A variety of first catalysts are disclosed, for example, in U.S. Pat. Nos. 6,376,569 (Oxman et al.), 4,916,169 (Boardman et al.), 6,046,250 (Boardman et al.), 5,145,886 (Oxman et al.), 6,150,546 (Butts), 4,530,879 (Drahnak), 4,510,094 (Drahnak) 5,496,961 (Dauth), 5,523,436 (Dauth), 4,670,531 (Eckberg), as well as International Publication No. WO 95/025735 (Mignani). «190»

|FMT - Text#«191+»

Fortis Lichida has spurious segmentation here after acronyms and abbreviations. The exceptions list that lets it know that the period after such a word is not actually a sentence boundary is not be functioning correctly.

Framemaker (.mif) Results

The Framemaker filter gives the highest segmentation disparity yet. There are several more tag-segments which cause the expansion.


Not Trans. Total Seg. Not Trans. Total Seg. Not Trans. Total Seg.
Fortis 3 324 541 96 190 320 706
Fortis Revolution 302 302 79 79 222 236
Seg: F3/FR 1.79 2.41 2.99

The average is 2.40 more segments in Fortis 3 than in Fortis Revolution. In Fortis 3, the |FMT, |Variable, |CondM, and |Frame# tags are often tag-segments.

The segmentation of sentences can be interrupted by these tags. Tags within the text have, in some cases, caused segment breaks, which will naturally reduce the partially translated and fuzzy matched segments. The following is a prime example:

Fortis Revolution Fortis 3
Errors on the <o t="var" x="Product Name"/><f x='zvb variable '/><f x='* '/> have many causes, including poor protocol writing, incorrect operator setup, variation in plates, hardware failure and software failure. <p t='bo Body' x='60'/>

<o t="marker" x="Conditional Text" v="+760911"/>It is important to understand that error handling is a normal part of operating the <o t="var" x="Product Name"/><f x='zvb variable '/><f x='* '/> and that errors usually do not mean that the <o t="var" x="Product Name"/><f x='zvb variable '/><f x='* '/> has malfunctioned. «35»

Most errors are a result of operator error.<p t='bo Body'/> <p t='lip LinePrintOnly'/> <o t="marker" x="Cross-Ref" v="36879: mt MapTop: Compilation Warnings and Errors"/>Compilation Warnings and Errors<p t='mt MapTop' x='61'/> <p t='lim LineMap'/>«36»

Errors on the |Variable: "Product Name"#<F 0><F 1> have many causes, including poor protocol writing, incorrect operator setup, variation in plates, hardware failure and software failure. «108»

|FMT - Text#«109+»

|CondM: "+760911"#It is important to understand that error handling is a normal part of operating the |Variable: "Product Name"#<F 0><F 1> and that errors usually do not mean that the |Variable: "Product Name"#<F 0><F 1> has malfunctioned. «110»

Most errors are a result of operator error. «111»

|FMT - Text#«112+»

|XRefPoint: "36879: mt MapTop: Compilation Warnings and Errors"#Compilation Warnings and Errors«113»


Fortis Lichida seems to be missing what should be some mandatory segmentation boundaries here.

Appendix A

For convenience, the Regular expressions involved have been reproduced below.


Regular Expressions governing Resource File filter segmentation in Fortis Revolution and Fortis 3:

Fortis Revolution Fortis 3
<p x=.*>\s*«» FMT –

\|FMT[!#|]+#«»

Regular Expressions governing Word filter segmentation in Fortis Revolution and Fortis 3:

Fortis Revolution Fortis 3
<(PP |TC |TR |SS

|@BR)([^">]|("([^\\"]|(\\.))*?"))*?>«»

[\p{L}>/)"#][.:?!]\)*(?!\p{L})\s*«»

«»(\n+\|FMT -&#)+

\|FMT[!#|]+#(\n\|FMT[!#|]+#)*«»

[A-Za-zÀ-ÿ>/)"#][.:?!]\)*(!(\s\|?"&"(#|(\|"&"#)|(, "&"#))))(![!\n\s<(|])\s*«»

Regular Expressions governing Framemaker filter segmentation in Fortis Revolution and Fortis 3:

Fortis Revolution Fortis 3
[\p{L}>/)"#][.:?!]\)*(?!\p{L})\s*«»

(<\s*/?\s*[px](\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*=\s* (('([^\']|(\\.))*')|("([^\"]|(\\.))*")))*\s*/\s*>\s*)+«»

<\s*/\s*table\s*>«»

«»\n+\|FMT –

\|FMT[!#|]+#«»

[A-Za-zÀ-ÿ>/)"#][.:?!]\)*(!(\s\|?"&"(#|(\|"&"#)|(,"&"#))))(![!\n\s<(|])\s*«»

Regular Expressions governing segmentation exceptions in Fortis Revolution and Fortis 3:

Fortis Revolution Fortis 3
attn\.

avg\.

bldg\.

bldr\.\)*\s«»

chronol\.\)*\s«»

Dr\.

etc\.(?!\s\p{Lu})

fwd\.

gds\.

gra(d|ph)\.

hf\.

hr\.

incl?\.

inj\.

inv\.

\bpt\. (^|[\s>(\-])(ab(br*(ev)*|str)*|comp|e(lec|ncl*|ngr|tc)|fig|m(ax|i[dn]) |no|num|pt|r(ect*|el|esp*)|s(tr|ubj)|t(emp|fr)|univ|vt|wt)\.\)*\s«»

^#\s*Created on:.*$

«<\!--.*-->»

«<[A-Za-z][^<>]+>»

«\|((FMT - .*#)|([-A-Za-z0-9()]+:\s*".*"#)|([A-Za-z0-9()]+#))»

«<Style>.*</Style>»

«<Address>.*</Address>»

«<Script>.*</Script>»

[\s\.]((<(!\s)[\s-;=?-~]*>)*[a-zÀ-ÿ]\.)+\)*\s«»

attn\.\)*\s«»

avg\.\)*\s«»

blgd\.\)*\s«»

bldr\.\)*\s«»

chronol\.\)*\s«»

Dr\.\s*«»

dz)\.\)*\s«»

etc\.\s(![A-Z])«»

fwd\.\)*\s«»

gds\.\)*\s«»

gra(d|ph)\.\)*\s«»

hf\.\)*\s«»

hr\.\)*\s«»

incl*\.\)*\s«»

inj\.\)*\s«»

inv\.\)*\s«»

\spt\.\)*\s«»

(^|[\s>(\-])(ab(br*(ev)*|str)*|comp|e(lec|ncl*|ngr|tc)|fig|m(ax|i[dn]) |no|num|pt|r(ect*|el|esp*)|s(tr|ubj)|t(emp|fr)|univ|vt|wt)\.\)*\s«»

^#\s*Created on:&$

«<\!--&-->»

«<[A-Za-z][!<>]+>»

«\|((FM[tT] - &#)|([-A-Za-z0-9()]+:\s*"&"#)|([A-Za-z0-9()]+#))»

«<Style>&</Style>»

«<Address>&</Address>»

«<Script>&</Script>»

ID:

FORTIS REVOLUTION VS. FORTIS 3 SEGMENTATION AND TAG PROTECTION CHANGES

Segmentation Rules

We are trying to make the regular expressions as inclusive as possible, instead of having different segmentation rules for all of the main languages. The segmentation rules have been tuned to the following:


Word 2003, by sentences

<(PP |TC |TR |SS |@BR)([^">]|("([^\\"]|(\\.))*?"))*?>«»

[\p{L}>/)"#][.:。?!]\)*(?![\p{L}".])\s*«»


Framemaker, by sentences

[\p{L}>/)"#][.:?!]\)*(?!\p{L})\s*«»

(<\s*/?\s*[px](\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*=\s*(('([^\']|(\\.))*')|("([^\"]|(\\.))*")))*\s*/\s*>\s*)+«»

<\s*/\s*table\s*>«»


RC

<p x=.*>\s*«»

[\p{L}>/)"#][.:?!]\)*(?![\p{L}/)<>"#])\s*«»


Tag Protection

Word 2003:

This has not changed.

(\\.)|(<([^"<>]|("([^\\]|(\\.))*?"))*?>)|(\[.*?\])


Framemaker Filter:

This will protect tags as yet ignored by Framemaker.


(<\s*/*\s*[obtfpx]r*(\s+([a-zA-Z0-9_][a-zA-Z0-9_]*)\s*=\s*(('([^\']|(\\.))*')|("([^\"]|(\\.))*")))*\s*/\s*>)+

</*table.*>


RC Filter:

In order to protect string table constants, I added three expressions to the RC filter tag protection settings:

<section>

<p ([ a-zA-Z0-9]+='.*?')+>

(\\[rtn])+

%(l*[fcsdx])+

%[0-9]+

Segmentation Exceptions

The Segmentation Exceptions have been rewritten. The expressions are long, but clear, in an effort to make their editing more manageable. Abbreviations are listed by language, in alphabetical order, and are as follows:


English:


\s(abbr|abbrev|acad|alt|AM|A\.M|AD|A\.D|apr|apt|assn|attn|aug|ave|avg|AWOL|B\.A|B\.C|B\.S|bldg|bldr|blvd|capt|chronol?|cm|cu|ctr)\. \s(cent|ca?|co|col|cpl|comp|corp|ct|dec|dept|DC|D\.C|deg|dr|div|ed|\(?e\.g|elec|encl|engr|et\.?\s*al|etc|feb|fig|FM|ft|fwd)\. \s(gal|gen|GMT|gov|grad|graph|hr|hwy|ibid|\(?i\.e|in|inj|incl?|inst|inv|jan|jr|kg|km|lk|ln|lib|lat|lt|ltd|long|M\.A|M\.H|mar)\. \s(max|MD|mfg|mi|mid|min|m\.d|mph|mm|mg|mr|msgr|mo|mt|mus|nov|no|nos|nov|num|oct|oz|p+|pat|Ph\.D|pl|P\.O|pop|p\.m|prof|pt|qt)\. \s(rd|rel|resp|rev|r\.n|rpm|sec|sept|sgt|sq|sr|sta|st|ste|str|subj|sun|temp|ter|tpk|univ|U\.S\.?A?|vol|vs|wt|yd)\.


French:

\s(agr|ap|appt|arr|art|at|av|bd|br|bt|b\.lat|cfr|chap|cit|cm|déf|dépt|div|dyn|él|élec|élect)\. \s(électr|électron|etc|exp|établt|fig|frs|hab|ib(id)*|max|min|no|ob|pos|p\.\s*ex|qqc|qqn|réf|rép|resp|sec|soc)\.


German:

\s(abb|abh|allg|autom|adm|art|bez|bzw|ca|chf|div|dyn|dat|D\.h|dt|elekt?r?|etc|evtl|eig|ehem)\. \s(fa|fzg|freig|getr|ggfs?|gr|gereg|ges|hinw|hydr|hsp|ind|incl|kl|kpm|kompl|kontr|)\. \s(max?(sch)?|mech|mechan|mand|mind|mod|mol|nr|nl|nschl|nsp|pkt|pos|prof|spät|spr|sec|sek|schr|std|str|spez)\. \s(temp|tel|ttl|tzl|u\.a|usw|ve?rz|ve?bl|vgl|vgl\.\s*z\.\s*B|vol|z\.\s*B|zchng|zul|zw|zyl|z\.?t|amp|ant|el|eing|fig|ger|ges|ltr?|od|st(at)?)\.


Dutch:


The Dutch exceptions seem a bit out of hand. Any expertise in this area that would help to weed out eroneous abbreviations, or verify useful ones would be welcome.


\s(acad|acc|adj|adm|afb|afd|afk|afl|afz|al|alg|alt|apr|arr|art|asp|atm|betr|bijl|bijz)\. \s(blz|br|burg|ca|cand|cap|cat|cf|chr|cie|cod|coöp|ct|dag|dat|deb|dec|del|dept|derg|dgl|dir|distr|div|do|dors?|dw)\. \s(e\.d|eerw|eig|em|eng|enk|enz|evt?|ex|exc|excl|fa|febr|fec|fig|fl|fol|fr|ge|geb|gebr|geh|gem|gen|gep|gesch|get|gez|gld|gr|gymn)\. \s(hd|herv|hh|hoogl|hr|hs|ib|ibid|id|impr?|incl|inf|ing|inl|insp|inte|inz|ir|isr|it|jan|jg|jhr|jkvr|jl|jr)\. \s(kan|kand|kap|kapt|kar|kath|kl|kon|lat|lb|li|lib|lic|ll|log|lt|mad|maj|max|med\.\sdrs|mej|mevr|mgr|mij|mil|min|mld|mln|mr|ms|muz|mv|mw)\. \s(nat|ndl?|ned|nl|nom?|nr|ob|obl|okt|ong|onz|op|op\.?\s?cit|opm|o\.m|opp|pag|par|pct|pd|perf|pers|plm?|praes|pres|pro|proc|prop|prot|ps)\. \s(red|reg|resp|rom|schr|sc|sec|sept|spr|sq|sr|st|stb|stct|st\.-gen|subst?|tab|td|tel|temp|test|tgov|tit|trim|tw|U\.S\.?A?)\. \s(vac|val|vdt|verb|verg|versch|vert|vgl|vid|vlg?|vnl|vnw|vo+lg|vo+rl|vo+rm|vo+rw|vo+rz|vr|vr\.pr|vs|vz|wd|wed|weled|weledel|weledelgeb)\. \s(weledelgestr|wleerw|wsch|z\.br|z\.em|z\.exc|zgn?|zpg|zr)\. \s(a\.(a\.)?(u\.)?|a\.b|a\.c|a\.d|a\.e|a\.g|a\.h\.d|a\.h\.w|a\.i|a\.p|a\.s|a\.u\.b|a\.v|a\.w|b\.b\.h\.h|b\.d|b\.\sen\sw|b\.g*|b\.i|b\.lo?b\.v|b\.w)\. \s(c\.a\.?o?|c\.if?|c\.l|c\.o|c\.q|c\.s|d\.a\.v|d\.d|d\.i|d\.m\.v|d\.p|d\.t\.p|d\.v|d\.w\.z|e\.a|e\.c\.g?|e\.d|e\.(e\.)?g|e\.i|e\.k|e\.o|e\.p|e\.v\.a?)\. \s(f\.a\.q|f\.d\.c|f\.o\.b|f\.o\.r|g\.g\.d|g\.o\|g\.t|g\.v\.d|h\.a|h\.h|h\.b\.b\.h\.h|h\.c|h\.d\.s|h\.e|h\.i|h\.k\.h|h\.m|h\.o|h\.o\.k\.t|h\.o\.l\.t|h\.r| \s(h\.s|h\.t|h\.t\.l|i\.a\.a|i\.b\.d\.|i\.b\.v|i\.c|i\.e|i\.g\.st|i\.g\.z|i\.h\.a|i\.h\.b|i\.m|i\.s\.m|i\.o|i\.p\.v|i\.t\.t|i\.v|i\.v\.m|i\.v\.o|i\.z\.g\.st|J\.C)\. \s(k\.g\.v|k\.k|k\.o|l\.b|l\.b\.o|l\.c|l\.g|m\.a\.w|m\.b\.t|m\.b\.v|m\.d|m\.g|m\.i|m\.v|m\.e\.t|m\.h\.d|m\.h\.g|m\.m|m\.m\.l|m\.m\.k|m\.v|m\.n|m\.o|m\.o\.b)\. \s(m\.u\.v|m\.z|n\.a|n\.a\.g|n\.a\.v|n\.Chr|n\.h|n\.l|n\.m|n\.m\.m|n\.n|n\.b|n\.br|n\.n\.o|n\.n\.w|n\.o|n\.o\.m|n\.o\.t\.k|n\.s|n\.t|n\.v\.t|n\.w)\. \s(o\.a|o\.b|o\.c|o\.e\.r|o\.g|o\.i|o\.i\.d|o\.i\.o|o\.k|o\.l\.v|o\.o|o\.o\.v|o\.r|o\.r\.t|o\.v\.v|o\.w|p\.a|p\.c|p\.d|p\.e|p\.f|p\.j|p\.m|p\.o|p\.p|p\.p\.d|p\.r|p\.w)\. \s(r\.f\.s\.v\.p|r\.s\.v\.p|r\.i\.p|s\.g|s\.h|s\.m|s\.s|s\.s\.t\.t|s\.t|s\.v|s\.v\.p|t\.a\.n|t\.a\.p|t\.a\.v|t\.b|t\.b\.c|t\.b\.v|t\.d|t\.d\.e|t\.e\.a\.b|t\.g\.t|t\.g\.v)\. \s(t\.h|t\.h\.t|t\.k\.a|t\.l\.v|t\.n\.v|t\.o|t\.o\.v|t\.w|t\.w\.z|t\.z|t\.z\.p|t\.z\.t|t\.z\.v|v\.a|v\.a\.g\.v|a\.b|v\.c|v\.chr|v\.d|v\.d\.e\.n|v\.d\.j|v\.d\.s)\. \s(v\.g\.a|v\.g\.g\.v|v\.g\.h|v\.h|v\.h\.t\.h|v\.i\.o|v\.j|v\.k|v\.k\.a|v\.l\.n\.r|v\.l\.o|v\.m|v\.o|v\.o\.n|v\.r\.n\.l|v\.v)\. \s(w\.g|w\.i|w\.l|w\.o|w\.vl|w\.v\.s|w\.v\.s\.tr|w\.v\.s\.tr|w\.v\.t\.t\.k|z\.a|z\.b|z\.b\.b\.h\.h|z\.d|z\.d\.h|z\.e|z\.g|z\.g\.a\.n|z\.h)\. \s(z\.h\.s|z\.i|z\.j|z\.k|z\.k\.h|z\.k\.m|z\.o|z\.o\.z|z\.s\.m|z\.t|z\.z\.g\.g|z\.z\.o|z\.z\.w)\.


Additions, or improving modifications are always welcome.

Retrieved from "http://starfish.multiling.com/wiki/index.php/Performance_Comparisons"

This page has been accessed 1,785 times. This page was last modified 10:18, 27 June 2008.


Find

Browse
Main Page
Recent changes
Random page
Help
Edit
View source
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Log in / create account
Special pages
New pages
File list
Statistics
Bug reports
More...