|
The Paragraph Flow
ObjectBy
Didier PH Martin, June 26, 1999
Contents
-
Home Page
Introduction
Visual
Model
-
Logical
Model
-
Models
Synthesis
-
Paragraph
Characteristics
-
OpenJade
Paragraph Object Translation
IntroductionThe paragraph object is
a fundamental DSSSL object. It is as fundamental as paragraphs are to
written documents. It is a container object which can contain other flow
objects.
Throughout this text, when
referring to the containment concept, we mean both visual containment and
logical containment taking form of a collection. Also, the terms
formatting object and flow objects are used in this document to express
the same concept. The DSSSL specification uses the term "flow object", we
also use the term "formatting object" to relate the concept to other
"formatting languages" using this terminology. Thus, flow objects and
formatting objects refer to the same concept in this document.
DSSSL allows to map paragraph
objects to SGML or XML elements. Or, from an other perspective, each SGML
or XML elements are mapped to a DSSSL formatting object.
A paragraph object has a
property set through which values are set by a DSSSL script. In the DSSSL
specification document, a formatting object property is called a
characteristic. However, a formatting object property set
could also be related to groves' property set. A
property set is an abstract data model associated to an object.
The paragraph object is
equivalent to the CSS
block object or to the XSL fo:block object.
A paragraph flow object is
not solely restricted to visual rendition. Packages like BraiFo transform
any XML/SGML document into Braille. Even if not all visual characteristics
are not supported in Braille, useful ones in this context are still in use
(indentation, quadding, etc..).
The Visual
Model
In DSSSL, all visual objects
are areas, as defined in the specifications:
| "An area is a rectangular box with
a fixed width and height. An area is also a specification of a set
of marks that can be imaged on a presentation medium. An area may
contain other areas." |
More particularly, in the
DSSSL specifications, the paragraph object is a display
area:
| "Display areas are areas that are
not directly parts of lines. A display area has
an inherent absolute orientation.
NOTE 43 Informally, the box has an arrow on it
saying ‘this way up’.
The
positioning of display areas is specified by area containers. An area
container has its own coordinate system
with its origin at the lower left corner, the positive x-axis
extending horizontally to the right and the
positive y-axis extending vertically
upward." |

- For instance, an
area container could be a page flow object. Thus, a page can contain
paragraphs. This is because, a page object is a display
area.
-
-
-
- The area container
imposes a direction to contained display areas. For instance, a page
object (an area container) imposes a direction to a set of paragraphs
(display areas). In the case of most occidental languages, the direction
is top down.
-
- In the same vein, a
paragraph being an area container and then a display area, can contain
other flow objects.
The
Logical Model
-
| "A paragraph flow object represents a paragraph.
It has a single principal port. The contents of this port may be either inlined or displayed. Inline flow
objects are formatted to produce line areas. Displayed flow
objects implicitly specify a break, and their areas shall be added
to the resulting sequence of areas. A paragraph flow object may
only be displayed." |
Several DSSSL flow objects
are collection containers. Thus, a particular flow object may be perceived
as a flow object collection and simultaneously as a flow object layout
configuration. In the case of the paragraph flow object, this means
that it contains a single flow objects collection (also called a
stream), this collection is laid out within a bounded area. So,
like all other DSSSL flow objects, a paragraph flow object has two
facets:
- An abstract one - a single
collection of flow objects.
- A visual one - a
particular flow objects layout within a bounded area.
So, the paragraph object is
contained in a parent's collection (i.e. a stream attached to the parent's
port). And itself, it is a flow object container having a single
port (i.e. collection).
For example, a
simple-page-sequence flow object contains several paragraph flow objects
and one of these flow objects contains:
- a line-break flow
object
- an embedded-text
flow object
- a line-break flow
object
A flow object collection
(i.e. a stream) is ordered. Thus, in the example above, the
paragraph object contains three (3) objects. The first one in the
collection is to be displayed first, the last one is the last to be
displayed. This is why we call these objects: flow objects. A flow
has an implicit order. Thus, objects contained in a stream (i.e.
collection) are placed within the container area one after the other, in
the same position they have within the collection.
Models
Synthesis
Something is missing?
Yes, the SGML or XML
processed content. This is the data content associated to a particular
SGML or XML element. The element content is usually included in a
paragraph object collection with the DSSSL process-children
construct.
For example, a DSSSL script
creates the following flow objects collection:
- Data content from a XML or
SGML element
- a line-break flow
object
- an embedded-text
flow object
- a line-break flow
object
- Data content from a XML or
SGML element
This flow object's logical and visual model are
produced from a XML or SGML fragment and a DSSSL script.
| <par>this paragraph contains embedded text.
In fact, it is a Japanese text enclosed by two line-field objects
<price>定価 2800円</price></par>
SGML or XML fragment
- (element par)
- (make paragraph
- (process-children)
- )
- )
-
- (element price)
- (make display-group
- (make line-field)
- (process-children)
- (make line-field)
- )
- )
-
DSSSL script fragment
|
In the example
above, we used the display-group element to aggregate several flow objects
into a single entity.
Paragraph characteristics
The paragraph flow object
property set provides more information to the rendering engine on how to
display the paragraph. How paragraph object properties are translated into
a visual model is shown in the figure below
The paragraph property
set is composed of 59 properties as shown in the table below. Some of
these properties are inherited from container objects some are
not.
|
Property name |
Description |
| lines: |
is a
symbol specifying how the content of the paragraph shall be broken
into lines in the formatted output. |
| asis-truncate-char: |
is
either #f or a char object that determines the glyph to be inserted
when the lines: characteristic has the value asis-truncate and a
line is truncated. The initial value is #f. |
| asis-wrap-char: |
is
either #f or a char object that determines the glyph to be inserted
at the end of a line when the lines: characteristic has the value
asis-wrap and the line is broken other than after a character flow
object for which the record-end?: characteristic is true. The
initial value is #f. |
| asis-wrap-indent: |
is a
length-spec giving an indent to be added to the start-indent when
the lines: characteristic has the value asis-wrap for a line
following a break other than after a character flow object for which
the record-end?: characteristic is true. The initial value is
#f. |
| first-line-align: |
is
either #f, #t, or a char object. If it is not #f, then the quadding:
and last-line-quadding: characteristics are ignored for the first
line of the paragraph, and the first line shall be aligned using an
alignment point in the line. If the value is a char object, then the
alignment point shall be the position point of the first area
produced by the first occurrence on the line of a character flow
object with a char: characteristic equal to that char object;
otherwise, the alignment point shall be the position of the first
alignment-point flow object in the line. If alignment-point-offset:
is not #f, then the first line of the paragraph shall be aligned so
that the percentage of the line length (that is, the display-size
less the applicable start and end indents) before the alignment
point is equal to the value of alignment-point-offset:. If
alignment-point-offset: is #f, then the paragraph is an
externally aligned paragraph and shall have an ancestor of
class table-cell or aligned-column. Furthermore, the area container
in which the areas from this paragraph are placed shall be the same
as the area container in which the areas from that ancestor are
placed; in this case, the paragraph shall be aligned so that its
alignment point is aligned with other such paragraphs in the
table-column or aligned-column. If an externally aligned paragraph
occurs in a table-cell, then the table-auto-width feature shall be
enabled. The initial value is #f. |
| alignment-point-offset: |
is
either #f or a number between 0 and 100 specifying the percentage of
the line length (that is, the display-size less the start and end
indents) before the alignment point. The initial value is
50. |
| ignore-record-end?: |
is a
boolean specifying whether a record-end shall be ignored. If this
characteristic is true, then a character with the record-end?
property true shall be ignored. The initial value is
#f. |
| expand-tabs?: |
is either #f or a strictly positive integer specifying the
tab interval. When a tab interval is specified, each character flow
object that has the input-tab?: characteristic true shall be treated
as equivalent to the smallest strictly positive number of spaces
that when added to the number of character flow objects following
the last preceding record-end character flow object shall be a
multiple of the tab interval. The initial value is
8. |
| line-spacing: |
is a
length-spec giving the normal spacing between the placement paths of
lines in the paragraph as described in 12.6.6.1. The initial value
is 12pt. |
| line-spacing-priority: |
is
either an integer or the symbol force specifying the priority of any
conditional space before the line. This shall be interpreted in the
same manner as the priority: argument for the display-space
procedure. The initial value is 0. |
| min-pre-line-spacing: |
is a
length-spec specifying the minimum size of the line in the placement
direction before the placement path as described in 12.6.6.1. A
value of #f shall also be allowed, specifying that the value is
determined from the paragraph's font. The initial value is
#f. |
| min-post-line-spacing: |
is a
length-spec specifying the minimum size of the line in the placement
direction after the placement path as described in 12.6.6.1. A value
of #f shall also be allowed, specifying that the value is determined
from the paragraph's font. The initial value is
#f. |
| min-leading: |
is
either #f or a length-spec specifying the minimum space between the
line areas in the placement direction as described in 12.6.6.1. A
value of #f means that the line spacing shall not be automatically
adjusted to take into account the size of the content of the lines.
The initial value is #f. |
| first-line-start-indent: |
is a
length-spec giving an indent to be added to the start-indent for the
first line. The length may be negative. The initial value is
0pt. |
| last-line-end-indent: |
is a
length-spec giving an indent to be added to the end-indent for the
last line. The length may be negative. The initial value is
0pt. |
| hyphenation-char: |
is a
char that is used to determine the glyph that is inserted when
hyphenation is performed. The characteristics of the character flow
object preceding the hyphenation point shall determine the mapping
of the character to a glyph, as well as the font resource and
font-size of the glyph. The initial value is #\- (the hyphen
character). |
| hyphenation-ladder-count: |
is a
strictly positive integer specifying the maximum number of
consecutive lines ending with the same glyph as the glyph determined
by the value of the hyphenation-char: characteristic, or #f
indicating that there is no limit. The initial value is
#f. |
| hyphenation-remain-char-count: |
is a
positive integer specifying the minimum number of characters in a
hyphenated word before the hyphenation character. This is the
minimum number of characters in the word left on the line ending
with the hyphenation character. The initial value is
2. |
| hyphenation-push-char-count: |
is a
positive integer specifying the minimum number of characters in a
hyphenated word after the hyphenation character. This is the minimum
number of characters in the word pushed to the next line after the
line ending with the hyphenation character. The initial value is
2. |
| hyphenation-exceptions: |
is a
list of strings. Each string is a word which may contain hyphen
characters, #\-, indicating where hyphenation may occur. If a word
to be hyphenated occurs in the list, it may only be hyphenated in
the specified places. The initial value is the empty
list. |
| line-breaking-method: |
is #f
or a string specifying a public identifier for the
line-breaking-method to be used for this paragraph. The initial
value is #f. |
| line-composition-method: |
is #f
or a string specifying a public identifier for the
line-composition-method to be used for this paragraph. The initial
value is #f. |
| implicit-bidi-method: |
is #f
or a string specifying a public identifier for the method to be used
for implicitly determining the directionality of the content of the
paragraph. This includes both the writing-mode of characters, which,
when this characteristic is #f, is specified with the writing-mode
characteristic, and how portions of content with a common
writing-mode are nested within each other, which, when this
characteristic is #f, is specified with embedded-text flow objects.
It is part of the semantics of the method which characteristics of
character flow objects, if any, it uses. A method may be specific to
a particular character repertoire, in which case, it may not make
use of any characteristics. It may be part of the semantics of a
method for certain glyph substitutions to be applied depending on
the writing-mode that is determined for a character, and possibly
also on characteristics of the character. The initial value is
#f. |
| glyph-alignment-mode: |
is
one of the symbols base, center, top, bottom, or font specifying the
alignment mode to be used for glyphs. font means that the nominal
alignment mode of the font in the flow object's writing-mode should
be used. The initial value is font. |
| font-family-name: |
is
either #f, indicating that any font family is acceptable, or a
string giving the font family name property of the desired font
resource. The initial value is iso-serif. |
| font-weight: |
is
either #f, indicating that any font weight is acceptable, or one of
the symbols not-applicable, ultra-light, extra-light, light,
semi-light, medium, semi-bold, bold, extra-bold, or ultra-bold,
giving the weight property of the desired font resource. The initial
value is medium. This characteristic is applicable when the
glyph-alignment-mode: is font or when min-pre-line-spacing: or
min-post-line-spacing: is #f. |
| font-posture: |
is
either #f, indicating that any posture is acceptable, or one of the
symbols not-applicable, upright, oblique, back-slanted-oblique,
italic, or back-slanted-italic, giving the posture property of the
desired font resource. The initial value is upright. This
characteristic is applicable when the glyph-alignment-mode: is font
or when min-pre-line-spacing: or min-post-line-spacing: is
#f. |
| font-structure: |
is
either #f, indicating that any structure is applicable, or one of
the symbols not-applicable, solid, or outline. The initial value is
solid. This characteristic is applicable when the
glyph-alignment-mode: is font or when min-pre-line-spacing: or
min-post-line-spacing: is #f. |
| font-proportionate-width: |
is
either #f, indicating that any proportionate width is acceptable, or
one of the symbols not-applicable, ultra-condensed, extra-condensed,
condensed, semi-condensed, medium, semi-expanded, expanded,
extra-expanded, or ultra-expanded. The initial value is medium. This
characteristic is applicable when the glyph-alignment-mode: is font
or when min-pre-line-spacing: or min-post-line-spacing: is
#f. |
| font-name: |
is
either #f, indicating that any font name is acceptable, or a string
which is the public identifier for the font name property of the
desired font resource. When the value is a string, the values of the
font-family-name:, font-weight:, font-posture:, font-structure:, and
font-proportionate-width: characteristics are not used in font
selection. The initial value is #f. This characteristic is
applicable when the glyph-alignment-mode: is font or when
min-pre-line-spacing: or min-post-line-spacing: is
#f. |
| font-size: |
is a
length specifying the body size to which the font resource should be
scaled. The initial value is 10pt. This characteristic is applicable
when min-pre-line-spacing: or min-post-line-spacing: is
#f. |
| numbered-lines?: |
is #t
if the lines produced by this paragraph shall be considered for the
purposes of line numbering, and #f otherwise. The initial value is
#t. |
| line-number: |
is
either #f or an unlabeled sosofo containing only inline flow
objects. If it is a sosofo, then for each line in the paragraph, the
sosofo is formatted to produce a single inline area that is
positioned as an attachment area for the line. The initial value is
#f. |
| line-number-side: |
is
one of the symbols start, end, spread-inside, spread-outside,
page-inside, or page-outside specifying the side of the line for the
attachment specified with the line-number: characteristic. A value
of spread-inside or spread-outside shall be allowed only if the flow
object has an ancestor of class page-sequence. A value of
page-inside or page-outside shall be allowed only if the flow object
has an ancestor of column-set-sequence. |
| line-number-sep: |
is a
length-spec specifying the separation for the attachment specified
with the line-number: characteristic. |
| quadding: |
is
one of the symbols start, end, spread-inside, spread-outside,
page-inside, page-outside, center, or justify specifying the
alignment of lines other than the last line in the paragraph in the
direction determined by the writing-mode. A value of spread-inside
or spread-outside shall be allowed only if the flow object has an
ancestor of class page-sequence. A value of page-inside or
page-outside shall be allowed only if the flow object has an
ancestor of column-set-sequence. The initial value is
start. |
| last-line-quadding: |
is
one of the symbols relative, start, end, spread-inside,
spread-outside, page-inside, page-outside, center, or justify
specifying the alignment of the last line of the paragraph in the
direction determined by the writing-mode. This shall apply also to
any line in the paragraph that immediately precedes a break. A value
of relative means that the value of the quadding: characteristic
shall be used, except when that value is justify, in which case, a
value of start shall be used. A value of spread-inside or
spread-outside shall be allowed only if the flow object has an
ancestor of class page-sequence. A value of page-inside or
page-outside shall be allowed only if the flow object has an
ancestor of column-set-sequence. The initial value is
relative. |
| last-line-justify-limit: |
is a
length-spec specifying the maximum amount of free space in the last
line that shall cause the last line to be justified rather than
aligned as specified by the last-line-quadding: characteristic. The
initial value is 0. |
| justify-glyph-space-max-add: |
is a
length-spec specifying the maximum space that may be added between
glyphs in order to justify a line. The initial value is
0pt. |
| justify-glyph-space-max-remove: |
is a
length-spec specifying the maximum space that may be removed between
glyphs in order to justify a line. The initial value is
0pt. |
| hanging-punct?: |
is a
boolean specifying whether the paragraph shall be formatted with the
punctuation characters hanging into the margin or gutter of a
column. The initial value is #f. |
| widow-count: |
is a
positive integer specifying the minimum number of lines of the
paragraph that shall be kept together at the beginning of an area.
If the widow-count: is n, then no break shall be allowed between the
last n lines of the paragraph. The initial value is
2. |
| orphan-count: |
is a
positive integer specifying the minimum number of lines of the
paragraph that shall be kept together at the end of an area. If the
orphan-count: is n, then no break shall be allowed between the first
n lines of the paragraph. The initial value is
2. |
| language: |
is #f
or a symbol specifying the ISO 639 language code in upper-case. This
affects line composition in a system-dependent way. The initial
value is #f. |
| country: |
is #f
or a symbol specifying the ISO 3166 country code in upper-case. This
affects line composition in a system-dependent way. The initial
value is #f. |
| position-preference: |
is
either #f or one of the symbols top or bottom. This applies if the
flow object is directed into a port on a column-set-sequence flow
object that is flowed into both the top-float and bottom-float zones
of a column-subset and indicates whether the areas from this flow
object may be flowed into only one of the zones. This characteristic
is not inherited. The default value is #f. |
| writing-mode: |
is
one of the symbols left-to-right, right-to-left, or top-to- bottom.
The direction determined by the writing-mode shall be perpendicular
to the placement direction. The initial value is left-to-right. This
controls the orientation of the placement path of the
lines. |
| start-indent: |
is a
length-spec specifying the indent for the edge of the area at the
start in the direction of the writing-mode. The initial value is
0pt. This applies only to lines from the paragraph
itself. |
| end-indent: |
is a
length-spec specifying the indent for the edge of the area at the
end in the direction of the writing-mode. The initial value is 0pt.
This applies only to lines from the paragraph
itself. |
| span: |
is a
strictly positive integer specifying the number of columns that the
areas resulting from this flow object shall span. This
characteristic shall apply if the flow object is directed into a
port on a column-set-sequence flow object that is flowed into the
top-float, bottom-float, or body-text zone of a spannable
column-subset. The initial value is 1. |
| span-weak?: |
is a
boolean specifying whether the areas resulting from this flow object
span weakly rather than strongly. See 12.6.5.1. This characteristic
applies if the flow object is directed into a port on a
column-set-sequence flow object that is flowed into the top-float,
bottom-float, or body-text zone of a spannable column-subset and has
a span: characteristic with a value greater than 1. The initial
value is #f. |
| space-before: |
is an
object of type display-space specifying space to be inserted before,
in the placement direction, the areas produced by the flow object.
This characteristic is not inherited. The default is for no space
before to be inserted. |
| space-after: |
is an
object of type display-space specifying space to be inserted after,
in the placement direction, the areas produced by the flow object.
This characteristic is not inherited. The default is for no space
after to be inserted. |
| keep-with-previous?: |
is a
boolean specifying whether the flow object shall be kept in the same
area as the previous flow object. This characteristic is not
inherited. The default value is #f. |
| keep-with-next?: |
is a
boolean specifying whether the flow object shall be kept in the same
area as the next flow object. This characteristic is not inherited.
The default value is #f. |
| break-before: |
is #f
or one of the symbols page, page-region, column, or column-set
specifying that the flow object shall start an area of that type.
This characteristic is not inherited. The default is
#f. |
| break-after: |
is #f
or one of the symbols page, page-region, column, or column-set
specifying that the flow object shall end an area of that type. This
characteristic is not inherited. The default is
#f. |
| may-violate-keep-before?: |
is a
boolean which, if true, specifies that constraints imposed by the
keep: characteristics of ancestor flow objects on the relative
positioning of this flow object and its previous flow object may not
be respected. This characteristic is not inherited. The default
value is #f. |
| may-violate-keep-after?: |
is a
boolean which, if true, specifies that constraints imposed by keep:
characteristics of ancestor flow objects on the relative positioning
of this flow object and its next flow object may not be respected.
This characteristic is not inherited. The default value is
#f. |
OpenJade
Paragraph Object TranslationThe paragraph object is
supported by most OpenJade backend processors and thus could be translated
into MIF, Tex, RTF and HTML elements. We will illustrate this with a XML
document transformed into these different formats with a DSSSL script as
shown below.
- <?xml
version="1.0"?>
- <?xml-stylesheet type="text/dsssl" href="Par.dsl"
media="screen,mif"?>
- <test>
- <par>This is a paragraph
object</par>
- </test>
XML
document
|
- <!DOCTYPE
style-sheet PUBLIC "-//James Clark//DTD DSSSL Style
Sheet//EN">
-
- (element
root
-
(make simple-page-sequence
- (process-children)
-
)
- )
-
- (element
par
- (make paragraph
-
(process-children)
-
)
- )
In the case of
the HTML output, and because of the on-line nature of HTML, the
following DSSSL rule is used instead:
- (element
root
-
(make scroll
- (process-children)
-
)
- )
- Thus for all
formats based on a page model, the root element (i.e. the XML
document) "makes" a simple-page-sequence formatting object and in
the case of on-line formats rendered in a browser, the root
element "makes" a scroll object.
-
- DSSSL
scripts
|
The following table
illustrates a sample XML document translated into different target
formats
|
Formats |
Created outputs |
| MIF |
- Generated
MIF document in the Talva SGML/XML Kit using the following
stylesheet processing instruction:
- <?xml-stylesheet type="text/dsssl" href="Par.dsl"
media="screen,mif"?>
With the OpenJade command
line, use the t -mif option. |
| Tex |
Generated
Tex document in the Talva SGML/XML Kit using the following
stylesheet processing instruction:
- <?xml-stylesheet type="text/dsssl" href="Par.dsl"
media="screen,tex"?>
With the OpenJade command
line, use the t -tex option. |
| RTF |
- Generated
RTF document in the Talva SGML/XML Kit using the following
stylesheet processing instruction:
- <?xml-stylesheet type="text/dsssl" href="Par.dsl"
media="screen,rtf"?>
With the OpenJade command
line, use the t -rtf option. |
| HTML |
- Generated
HTML document in the Talva SGML/XML Kit using the following
stylesheet processing instruction:
- <?xml-stylesheet type="text/dsssl" href="Par.dsl"
media="screen"?>
With the OpenJade command
line, use the t -html option.
The resultant
HTML document is associated with a CSS
stylesheet. When the HTML option is set, OpenJade creates a HTML
and CSS document.
For more
information on this option, read: DSSSL
formatting objects mapping to HTML+CSS. The html output format
mode. |
All trademarks herein are the property
of their respective owners. Copyright © 1999-2003 Didier
PH Martin,
All rights reserved. Created by Didier PH Martin,
modified: April 7, 2003
|