Send patches - preferably formatted by git format-patch - to patches at archlinux32 dot org.
summaryrefslogtreecommitdiff
path: root/doc/www.linuxjournal.com_article_2393.txt
blob: 308821c41eba33447fde5af68918c08a962f2f5c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
   [1]Skip to main content

   [2]

Linux Journal

   (BUTTON) Toggle navigation
     * [3]Topics+
          + [4]Cloud
          + [5]Containers
          + [6]Desktop
          + [7]Kernel
          + [8]Mobile
          + [9]Networking
          + [10]Privacy
          + [11]Programming
          + [12]Security
          + [13]Servers
          + [14]SysAdmin
     * [15]News
     * [16]eBooks

Search

   Search
   _______________ (Search) Search
   Enter the terms you wish to search for.
     * [17]News
     * [18]Popular
     * [19]Recent

Writing HTML with m4

[20]

   HOWTOs

   by Bob Hepple
   on March 1, 1998

   It's amazing how easy it is to write simple HTML pages--and the
   availability of WYSIWYG (what you see is what you get) HTML editors
   like Netscape Gold lulls one into a mood of "don't worry, be happy".
   However, managing multiple, inter-related pages of HTML rapidly gets
   very difficult. I recently had a slightly complex set of pages to put
   together, and I started thinking, "there has to be an easier way."

   I immediately turned to the WWW and looked up all sorts of tools--but
   quite honestly I was rather disappointed. Mostly, they were what I
   would call "typing aids"--instead of having to remember arcane
   incantations like <a href="link"7gt;text</a> text, you are given a
   button or a magic keychord like alt-ctrl-j which remembers the syntax
   and does all the typing for you.

   Linux to the rescue--since HTML is built as ordinary text files, the
   normal Linux text management tools can be used. This includes revision
   control tools such as rcs and the text manipulation tools like awk,
   Perl, etc. These tools offer significant help in version control and
   managing development by multiple users as well as automating the
   process of displaying information from a database (the classic grep
   |sort |awk pipeline).

   The use of these tools with HTML is documented elsewhere, e.g., Jim
   Weirich's article in Linux Journal Issue 36, April 1997, "Using Perl to
   Check Web Links". I highly recommend this article as yet another way to
   really flex those Linux muscles when writing HTML.

   What I will cover here is work I've done recently using the
   pre-processor m4 to maintain HTML. The ideas can very easily be
   extended to the more general SGML case.
   Using m4

   I decided to use m4 after looking at various other pre-processors
   including cpp, the C front-end, which is perhaps a little too
   C-specific to be useful with HTML. m4 is a generic and clean macro
   expansion program, and it's available under most Unices including
   Linux.

   Instead of editing *.html files, I create *.m4 files with my favourite
   text editor. These files look something like the following:
m4_include(stdlib.m4)
_HEADER(`This is my header')
<P>This is some plain text<P>
_HEAD1(`This is a main heading')
<P>This is some more plain text<P>
_TRAILER

   The format is just HTML code, but you can include files and add macros
   rather like in C. I use a convention that my new macros are in capitals
   and start with an _ character to make them stand out from HTML language
   and to avoid name-space collisions.

   The m4 file is then processed as follows to create an .html file using
   the command:
m4 -P <file.m4 >file.html

   This process is especially easy if you create a makefile to automate
   these steps in the usual way. For example:
.SUFFIXES: .m4 .html
.m4.html:
        m4 -P <$*.m4 >$*.html
DEFault:        index.html
*.html: stdlib.m4
all:    default PROJECT1 PROJECT2
PROJECT1:
        (cd project2; make all)
PROJECT2:
        (cd project2; make all)

   Some of the most useful commands in m4 are listed here with their cpp
   equivalents shown in parentheses:
     * m4_include: includes a common file into your HTML (#include)
     * m4_define: defines an m4 variable (#define)
     * m4_ifdef: a conditional (#ifdef)
     * m4_changecom: change the m4 comment character (normally #)
     * m4_debugmode: control error diagnostics
     * m4_traceon/off: turn tracing on and off
     * m4_dnl: comment
     * m4_incr, m4_decr: simple arithmetic
     * m4_eval: more general arithmetic
     * m4_esyscmd: execute a Linux command and use the output
     * m4_divert(i): This is a little complicated, so skip on first
       reading. It is a way of storing text for output at the end of
       normal processing. It will come in useful later, when we get to
       automatic numbering of headings. It sends output from m4 to a
       temporary file number i. At the end of processing, any text which
       was diverted is then output, in the order of the file number i.
       File number -1 is the bit bucket and can be used to comment out
       chunks of comments. File number 0 is the normal output stream.
       Thus, for example, you can use m4_divert to divert text to file 1,
       and it will only be output at the end.

   Sharing HTML Elements Across Several Pages

   In many "nests" of HTML pages, each page shares elements such as a
   button bar containing links to other pages like this:
[Home]  [Next]  [Prev]  [Index]

   This is fairly easy to create in each page. The trouble is that if you
   make a change in the "standard" button-bar then you have the tedious
   job of finding each occurrence of it in every file and manually making
   the changes. With m4 we can more easily do this job by putting the
   shared elements into an m4_include statement, just like C.

   Let's also automate the naming of pages by putting the following lines
   into an include file called button_bar.m4:
m4_define(`_BUTTON_BAR',
 <a href="homepage.html">[Home]</a>
 <a href="$1">[Next]</a>
 <a href="$2">[Prev]</a>
 <a href="indexpage.html">[Index]</a>)

   and then these lines in the document:
m4_include button_bar.m4
_BUTTON_BAR(`page_after_this.html',
 `page_before_this.html')

   The $1 and $2 parameters in the macro definition are replaced by the
   strings in the macro call.
   Managing HTML elements that often change

   It is troublesome to have items change in multiple HTML pages. For
   example, if your e-mail address changes, you need to change all
   references to it to the new address. Instead, with m4 you can put a
   line like the following in your stdlib.m4 file:
m4_define(`_EMAIL_ADDRESS', `MyName@foo.bar.com')

   and then just put _EMAIL_ADDRESS in your m4 files.

   A more substantial example comes from building strings with multiple
   components, any of which may change as the page is developed. If, like
   me, you develop on one machine, test out the page and then upload to
   another machine with a totally different address, then you could use
   the m4_ifdef command in your stdlib.m4 file (just like the #ifdef
   command in cpp). For example:
m4_define(`_LOCAL')
...
m4_define(`_HOMEPAGE',
 m4_ifdef(`_LOCAL',
 `//127.0.0.1/~YourAccount',
 `http://ISP.com/~YourAccount'))
m4_define(`_PLUG', `<A HREF="http://www.ssc.com/linux/">
<IMG SRC="_HOMEPAGE/gif/powered.gif"
ALT=<"[Linux Information]"> </A>')

   Note the careful use of quotes to prevent the variable _LOCAL from
   being expanded. _HOMEPAGE takes on different values according to
   whether the variable _LOCAL is defined or not. This definition can then
   ripple through the entire project as you build the pages.

   In this example, _PLUG is a macro to advertise Linux. When you are
   testing your pages, use the local version of _HOMEPAGE. When you are
   ready to upload, remove or comment out the _LOCAL definition in this
   way:
m4_dnl m4_define(`_LOCAL')

   ... and then re-make.
   Creating New Text Styles

   Styles built into HTML include things like <EM> for emphasis and <CITE>
   for citations. With m4 you can define your own new styles like this:
m4_define(`_MYQUOTE',
 <BLOCKQUOTE><EM>$1</EM></BLOCKQUOTE>)

   If, later, you decide you prefer <STRONG> instead of <EM>, it is a
   simple matter to change the definition. Then, every _MYQUOTE paragraph
   falls into line with a quick make.

   The classic guides to good HTML writing say things like "It is strongly
   recommended that you employ the logical styles such as <EM>...</EM>
   rather than the physical styles such as <I>...</I> in your documents."
   Curiously, the WYSIWYG editors for HTML generate purely physical
   styles. Using the m4 styles may be a good way to keep on using logical
   styles.
   Typing and Mnemonic Aids

   I don't depend on WYSIWYG editing (having been brought up on troff) but
   all the same I'm not averse to using help where it's available. There
   is a choice (and maybe it's a fine line) to be made between:
<BLOCKQUOTE><PRE><CODE>Some code you want to display.
</CODE></PRE></BLOCKQUOTE>

   and:
_CODE(Some code you want to display.)

   In this case, you would define _CODE like this:
m4_define(`_CODE',
<BLOCKQUOTE><PRE><CODE>$1</CODE></PRE></BLOCKQUOTE>)

   Which version you prefer is a matter of taste and convenience although
   the m4 macro certainly saves some typing. Another example I like to
   use, since I can never remember the syntax for links, is:
m4_define(`_LINK', <a href="$1">$2</a>)

   Then, instead of typing:
<a href="URL_TO_SOMEWHERE">Click here to get to SOMEWHERE
</a>

   I type:
_LINK(`URL_TO_SOMEWHERE', `Click here to get to SOMEWHERE')

   Automatic Numbering

   m4 has a simple arithmetic facility with two operators m4_incr and
   m4_decr. This facility can be used to create automatic numbering,
   perhaps for headings, for example:
m4_define(_CARDINAL,0)
m4_define(_H, `m4_define(`_CARDINAL',
 m4_incr(_CARDINAL))<H2>_CARDINAL.0 $1</H2>')
_H(First Heading)
_H(Second Heading)

   This produces:
<H2>1.0 First Heading</H2>
<H2>2.0 Second Heading</H2>

   Automatic Date Stamping

   For simple date stamping of HTML pages, I use the m4_esyscmd command to
   maintain an automatic timestamp on every page:
This page was updated on m4_esyscmd(date)

   which produces:
This page was last updated on Fri May 9 10:35:03 HKT 1997

   Generating Tables of Contents

   Using m4 allows you to define commonly repeated phrases and use them
   consistently. I hate repeating myself because I am lazy and because I
   make mistakes, so I find this feature an absolute necessity.

   A good example of the power of m4 is in building a table of contents in
   a big page. This involves repeating the heading title in the table of
   contents and then in the text itself. This is tedious and error-prone,
   especially when you change the titles. There are specialised tools for
   generating a table of contents from HTML pages, but the simple facility
   provided by m4 is irresistible to me.
   Simple To Understand TOC

   The following example is a fairly simple-minded table of contents
   generator. First, create some useful macros in stdlib.m4:
m4_define(`_LINK_TO_LABEL',
 <A HREF="#$1">$1</A>)
m4_define(`_SECTION_HEADER',
 <A NAME="$1"><H2>$1</H2></A>)

   Then define all the section headings in a table at the start of the
   page body:
m4_define(`_DIFFICULTIES',
 `The difficulties of HTML')
m4_define(`_USING_M4', `Using
 <EM>m4</EM>')
m4_define(`_SHARING', `Sharing HTML
 Elements Across Several Pages')

   Then build the table:
<UL><P>
 <LI> _LINK_TO_LABEL(_DIFFICULTIES)
 <LI> _LINK_TO_LABEL(_USING_M4)
 <LI> _LINK_TO_LABEL(_SHARING)
<UL>

   Finally, write the text:
 ...
_SECTION_HEADER(_DIFFICULTIES)
...

   The advantages of this approach are twofold. If you change your
   headings you only need to change them in one place, and the table of
   contents is then automatically regenerated. Also, the links are
   guaranteed to work.
   Simple To Use TOC

   The table of contents generator that I normally use is a bit more
   complex and requires a bit more study, but it is much easier to use. It
   not only builds the table, but it also automatically numbers the
   headings on the fly--up to four levels of numbering (e.g., section
   3.2.1.3), although this can be easily extended. It is very simple to
   use as follows:
    1. Where you want the table to appear, call Start_TOC.
    2. At every heading use _H1(`Heading for level 1') or _H2(`Heading for
       level 2') as appropriate.
    3. After the last line of HTML code (probably </HTML>), call End_TOC.

   The code for these macros is shown in [21]Listing 1. One restriction is
   that you should not use diversions (i.e., m4-divert) within your text,
   unless you preserve the diversion to file 1 used by this TOC generator.
   Simple Tables

   Other than Tables of Contents, many browsers support tabular
   information. Here are some funky macros as a short cut to producing
   these tables. First, an example (see Figure 1) of their use:
<CENTER>
_Start_Table(BORDER=5)
_Table_Hdr(,Apples, Oranges, Lemons)
_Table_Row(England, 100,250,300)
_Table_Row(France,200,500,100)
_Table_Row(Germany,500,50,90)
_Table_Row(Spain,,23,2444)
_Table_Row(Danmark,,,20)
_End_Table
</CENTER>

   Writing HTML with m4

   Figure 1. Example Table
   m4 Gotchas

   Unfortunately, m4 needs some taming. A little time spent on
   familiarisation will pay dividends. Definitive documentation is
   available (for example, in the Emacs info documentation system) but,
   without being a complete tutorial, here are a few tips based on my
   experiences.
   Gotcha 1--Quotes

   m4's quotation characters are the grave accent ` which starts the
   quote, and the acute accent ' which ends it. It may help to put all
   arguments to macros in quotes, for example:
_HEAD1(`This is a heading')

   The main reason for using quotes is to prevent confusion if commas are
   contained in an argument to a macro, since m4 uses commas to separate
   macro parameters. For example, the line _CODE(foo, bar) would put the
   foo in the HTML output but not the bar. Use quotes in the line
   _CODE(`foo, bar'), and it works properly.
   Gotcha 2--Word Swallowing

   The biggest problem with m4 is that some versions of it swallow key
   words that it recognises, such as include, format, divert, file, gnu,
   line, regexp, shift, unix, builtin and define. You can protect these
   words by putting them in single quotes, for example:
Smart people `include' Linux in their list
of computer essentials.

   The trouble is, this is both inconvenient and easy to forget.

   A safer way to protect keywords (my preference) is to invoke m4 with
   the -P or --prefix-builtins option. Then all built-in macro names are
   modified so that they all begin with the prefix m4_ and ordinary words
   are left as is. For example, using this option, one would write
   m4_define instead of define (as shown in the examples in this article).
   One hitch is that not all versions of m4 support this option--most
   notably some PC versions under MS-DOS.
   Gotcha 3--Comments

   Comment lines in m4 begin with the # character--everything from the #
   to the end of the line is ignored and output unchanged. If you want to
   use # in the HTML page, you must quote it like this: `#'. Another
   option (my preference) is to change the m4 comment character to
   something exotic with a line like this:
m4_changecom(`[[[[')

   and not have to worry about # symbols in your text.

   If you want to use comments in the m4 file but not have them appear in
   the final HTML file, use the macro m4_dnl (dnl = Delete to New Line).
   This macro suppresses everything until the next newline character.
m4_define(_NEWMACRO, `foo bar')
m4_dnl This is a comment

   Yet another way to have source code ignored is the m4_divert command.
   The main purpose of m4_divert is to save text in a temporary buffer for
   inclusion in the file later--for example, in building a table of
   contents or index. However, if you divert to "-1", it just goes to
   limbo-land. This option is useful for getting rid of the whitespace
   generated by the m4_define command. For example:
m4_divert(-1) diversion on
m4_define(this ...)
m4_define(that ...)
m4_divert diversion turned off

   Gotcha 4--Debugging

   Another tip for when things go wrong is to increase the number of error
   diagnostics that m4 outputs. The easiest way to do this is to add the
   following to your m4 file as debugging commands:
m4_debugmode(e)
m4_traceon
...
buggy lines
...
m4_traceoff

   Conclusion

   It should be noted that HTML 3.0 does have an include statement that
   looks like this:
<!--#include file="junk.html" -->

   However, the HTML include has the following limitations:
     * The work of including and interpreting the include is done on the
       server-side before downloading and adds overhead as the server has
       to scan files for include statements.
     * Most servers (especially public ISPs) deactivate this feature
       because of the large overhead.
     * Include is all you get--no macro substitution, no parameters to
       macros, no ifdef, etc., as with m4.

   There are several other features of m4 that I have not yet exploited in
   my HTML ramblings so far, such as regular expressions. It might be
   interesting to create a "standard" stdlib.m4 for general use with nice
   macros for general text processing and HTML functions. By all means
   download my version of stdlib.m4 as a base for your own hacking. I
   would be interested in hearing of useful macros, and if there is enough
   interest, maybe a Mini-HOWTO could evolve from this article.

   There are many additional advantages to using Linux to develop HTML
   pages, far beyond the simple assistance given by the typical typing
   aids and WYSIWYG tools. Certainly, I will go on using m4 until HTML
   catches up--I will then do my last make and drop back to using pure
   HTML. I hope you enjoy these little tricks and encourage you to
   contribute your own.

   Writing HTML with m4
   Bob Hepple has been hacking at Unix since 1981 under a variety of
   excuses and has somehow been paid for it at least some of the time.
   It's allowed him to pursue another interest--living in warm, exotic
   countries including Hong Kong, Australia, Qatar, Saudi Arabia, Lesotho
   and (presently) Singapore. His initial aversion to the cold was learned
   in the UK. Ambition--to stop working for the credit card company and
   tax man and to get a real job. Bob can be reached at
   bhepple@pacific.net.sg.
   [22]Load Disqus comments
   Our discussions are [23]powered by Disqus, which require JavaScript.
   Connect With Us

   Linux Journal, representing 25+ years of publication, is the original
   magazine of the global Open Source community.
   © 2024 Slashdot Media, LLC. All rights reserved.
     * [24]PRIVACY POLICY
     * [25]TERMS OF SERVICE
     * [26]ADVERTISE

Footer Menu Column 2

     * [27]Masthead
     * [28]Authors
     * [29]Contact Us

Footer Menu Column 3

     * [30]RSS Feeds
     * [31]About Us

   [noscript-448i7exgpyqpr9c144q.gif]

   ×

References

   Visible links:
   1. https://www.linuxjournal.com/article/2393#main-content
   2. https://www.linuxjournal.com/
   3. https://www.linuxjournal.com/
   4. https://www.linuxjournal.com/tag/cloud
   5. https://www.linuxjournal.com/tag/containers
   6. https://www.linuxjournal.com/tag/desktop
   7. https://www.linuxjournal.com/tag/kernel
   8. https://www.linuxjournal.com/tag/mobile
   9. https://www.linuxjournal.com/tag/networking
  10. https://www.linuxjournal.com/tag/privacy
  11. https://www.linuxjournal.com/tag/programming
  12. https://www.linuxjournal.com/tag/security
  13. https://www.linuxjournal.com/tag/servers
  14. https://www.linuxjournal.com/tag/sysadmin
  15. https://www.linuxjournal.com/news
  16. https://www.linuxjournal.com/books
  17. https://www.linuxjournal.com/news
  18. https://www.linuxjournal.com/popular
  19. https://www.linuxjournal.com/recent
  20. https://www.linuxjournal.com/tag/howtos
  21. https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/023/2393/2393l1.html
  22. https://www.linuxjournal.com/article/2393#disqus_thread
  23. https://disqus.com/?ref_noscript
  24. https://slashdotmedia.com/privacy-statement/
  25. https://slashdotmedia.com/terms-of-use/
  26. https://www.linuxjournal.com/sponsors
  27. https://www.linuxjournal.com/content/masthead
  28. https://www.linuxjournal.com/author
  29. https://www.linuxjournal.com/form/contact
  30. https://www.linuxjournal.com/rss_feeds
  31. https://www.linuxjournal.com/aboutus

   Hidden links:
  33. https://youtube.com/linuxjournalonline
  34. https://www.facebook.com/linuxjournal/
  35. https://twitter.com/linuxjournal