1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
|
[1]Skip to main content
[2]
Linux Journal
(BUTTON) Toggle navigation
* [3]Topics+
+ [4]Cloud
+ [5]Containers
+ [6]Desktop
+ [7]Kernel
+ [8]Mobile
+ [9]Networking
+ [10]Privacy
+ [11]Programming
+ [12]Security
+ [13]Servers
+ [14]SysAdmin
* [15]News
* [16]eBooks
Search
Search
_______________ (Search) Search
Enter the terms you wish to search for.
* [17]News
* [18]Popular
* [19]Recent
Writing HTML with m4
[20]
HOWTOs
by Bob Hepple
on March 1, 1998
It's amazing how easy it is to write simple HTML pages--and the
availability of WYSIWYG (what you see is what you get) HTML editors
like Netscape Gold lulls one into a mood of "don't worry, be happy".
However, managing multiple, inter-related pages of HTML rapidly gets
very difficult. I recently had a slightly complex set of pages to put
together, and I started thinking, "there has to be an easier way."
I immediately turned to the WWW and looked up all sorts of tools--but
quite honestly I was rather disappointed. Mostly, they were what I
would call "typing aids"--instead of having to remember arcane
incantations like <a href="link"7gt;text</a> text, you are given a
button or a magic keychord like alt-ctrl-j which remembers the syntax
and does all the typing for you.
Linux to the rescue--since HTML is built as ordinary text files, the
normal Linux text management tools can be used. This includes revision
control tools such as rcs and the text manipulation tools like awk,
Perl, etc. These tools offer significant help in version control and
managing development by multiple users as well as automating the
process of displaying information from a database (the classic grep
|sort |awk pipeline).
The use of these tools with HTML is documented elsewhere, e.g., Jim
Weirich's article in Linux Journal Issue 36, April 1997, "Using Perl to
Check Web Links". I highly recommend this article as yet another way to
really flex those Linux muscles when writing HTML.
What I will cover here is work I've done recently using the
pre-processor m4 to maintain HTML. The ideas can very easily be
extended to the more general SGML case.
Using m4
I decided to use m4 after looking at various other pre-processors
including cpp, the C front-end, which is perhaps a little too
C-specific to be useful with HTML. m4 is a generic and clean macro
expansion program, and it's available under most Unices including
Linux.
Instead of editing *.html files, I create *.m4 files with my favourite
text editor. These files look something like the following:
m4_include(stdlib.m4)
_HEADER(`This is my header')
<P>This is some plain text<P>
_HEAD1(`This is a main heading')
<P>This is some more plain text<P>
_TRAILER
The format is just HTML code, but you can include files and add macros
rather like in C. I use a convention that my new macros are in capitals
and start with an _ character to make them stand out from HTML language
and to avoid name-space collisions.
The m4 file is then processed as follows to create an .html file using
the command:
m4 -P <file.m4 >file.html
This process is especially easy if you create a makefile to automate
these steps in the usual way. For example:
.SUFFIXES: .m4 .html
.m4.html:
m4 -P <$*.m4 >$*.html
DEFault: index.html
*.html: stdlib.m4
all: default PROJECT1 PROJECT2
PROJECT1:
(cd project2; make all)
PROJECT2:
(cd project2; make all)
Some of the most useful commands in m4 are listed here with their cpp
equivalents shown in parentheses:
* m4_include: includes a common file into your HTML (#include)
* m4_define: defines an m4 variable (#define)
* m4_ifdef: a conditional (#ifdef)
* m4_changecom: change the m4 comment character (normally #)
* m4_debugmode: control error diagnostics
* m4_traceon/off: turn tracing on and off
* m4_dnl: comment
* m4_incr, m4_decr: simple arithmetic
* m4_eval: more general arithmetic
* m4_esyscmd: execute a Linux command and use the output
* m4_divert(i): This is a little complicated, so skip on first
reading. It is a way of storing text for output at the end of
normal processing. It will come in useful later, when we get to
automatic numbering of headings. It sends output from m4 to a
temporary file number i. At the end of processing, any text which
was diverted is then output, in the order of the file number i.
File number -1 is the bit bucket and can be used to comment out
chunks of comments. File number 0 is the normal output stream.
Thus, for example, you can use m4_divert to divert text to file 1,
and it will only be output at the end.
Sharing HTML Elements Across Several Pages
In many "nests" of HTML pages, each page shares elements such as a
button bar containing links to other pages like this:
[Home] [Next] [Prev] [Index]
This is fairly easy to create in each page. The trouble is that if you
make a change in the "standard" button-bar then you have the tedious
job of finding each occurrence of it in every file and manually making
the changes. With m4 we can more easily do this job by putting the
shared elements into an m4_include statement, just like C.
Let's also automate the naming of pages by putting the following lines
into an include file called button_bar.m4:
m4_define(`_BUTTON_BAR',
<a href="homepage.html">[Home]</a>
<a href="$1">[Next]</a>
<a href="$2">[Prev]</a>
<a href="indexpage.html">[Index]</a>)
and then these lines in the document:
m4_include button_bar.m4
_BUTTON_BAR(`page_after_this.html',
`page_before_this.html')
The $1 and $2 parameters in the macro definition are replaced by the
strings in the macro call.
Managing HTML elements that often change
It is troublesome to have items change in multiple HTML pages. For
example, if your e-mail address changes, you need to change all
references to it to the new address. Instead, with m4 you can put a
line like the following in your stdlib.m4 file:
m4_define(`_EMAIL_ADDRESS', `MyName@foo.bar.com')
and then just put _EMAIL_ADDRESS in your m4 files.
A more substantial example comes from building strings with multiple
components, any of which may change as the page is developed. If, like
me, you develop on one machine, test out the page and then upload to
another machine with a totally different address, then you could use
the m4_ifdef command in your stdlib.m4 file (just like the #ifdef
command in cpp). For example:
m4_define(`_LOCAL')
...
m4_define(`_HOMEPAGE',
m4_ifdef(`_LOCAL',
`//127.0.0.1/~YourAccount',
`http://ISP.com/~YourAccount'))
m4_define(`_PLUG', `<A HREF="http://www.ssc.com/linux/">
<IMG SRC="_HOMEPAGE/gif/powered.gif"
ALT=<"[Linux Information]"> </A>')
Note the careful use of quotes to prevent the variable _LOCAL from
being expanded. _HOMEPAGE takes on different values according to
whether the variable _LOCAL is defined or not. This definition can then
ripple through the entire project as you build the pages.
In this example, _PLUG is a macro to advertise Linux. When you are
testing your pages, use the local version of _HOMEPAGE. When you are
ready to upload, remove or comment out the _LOCAL definition in this
way:
m4_dnl m4_define(`_LOCAL')
... and then re-make.
Creating New Text Styles
Styles built into HTML include things like <EM> for emphasis and <CITE>
for citations. With m4 you can define your own new styles like this:
m4_define(`_MYQUOTE',
<BLOCKQUOTE><EM>$1</EM></BLOCKQUOTE>)
If, later, you decide you prefer <STRONG> instead of <EM>, it is a
simple matter to change the definition. Then, every _MYQUOTE paragraph
falls into line with a quick make.
The classic guides to good HTML writing say things like "It is strongly
recommended that you employ the logical styles such as <EM>...</EM>
rather than the physical styles such as <I>...</I> in your documents."
Curiously, the WYSIWYG editors for HTML generate purely physical
styles. Using the m4 styles may be a good way to keep on using logical
styles.
Typing and Mnemonic Aids
I don't depend on WYSIWYG editing (having been brought up on troff) but
all the same I'm not averse to using help where it's available. There
is a choice (and maybe it's a fine line) to be made between:
<BLOCKQUOTE><PRE><CODE>Some code you want to display.
</CODE></PRE></BLOCKQUOTE>
and:
_CODE(Some code you want to display.)
In this case, you would define _CODE like this:
m4_define(`_CODE',
<BLOCKQUOTE><PRE><CODE>$1</CODE></PRE></BLOCKQUOTE>)
Which version you prefer is a matter of taste and convenience although
the m4 macro certainly saves some typing. Another example I like to
use, since I can never remember the syntax for links, is:
m4_define(`_LINK', <a href="$1">$2</a>)
Then, instead of typing:
<a href="URL_TO_SOMEWHERE">Click here to get to SOMEWHERE
</a>
I type:
_LINK(`URL_TO_SOMEWHERE', `Click here to get to SOMEWHERE')
Automatic Numbering
m4 has a simple arithmetic facility with two operators m4_incr and
m4_decr. This facility can be used to create automatic numbering,
perhaps for headings, for example:
m4_define(_CARDINAL,0)
m4_define(_H, `m4_define(`_CARDINAL',
m4_incr(_CARDINAL))<H2>_CARDINAL.0 $1</H2>')
_H(First Heading)
_H(Second Heading)
This produces:
<H2>1.0 First Heading</H2>
<H2>2.0 Second Heading</H2>
Automatic Date Stamping
For simple date stamping of HTML pages, I use the m4_esyscmd command to
maintain an automatic timestamp on every page:
This page was updated on m4_esyscmd(date)
which produces:
This page was last updated on Fri May 9 10:35:03 HKT 1997
Generating Tables of Contents
Using m4 allows you to define commonly repeated phrases and use them
consistently. I hate repeating myself because I am lazy and because I
make mistakes, so I find this feature an absolute necessity.
A good example of the power of m4 is in building a table of contents in
a big page. This involves repeating the heading title in the table of
contents and then in the text itself. This is tedious and error-prone,
especially when you change the titles. There are specialised tools for
generating a table of contents from HTML pages, but the simple facility
provided by m4 is irresistible to me.
Simple To Understand TOC
The following example is a fairly simple-minded table of contents
generator. First, create some useful macros in stdlib.m4:
m4_define(`_LINK_TO_LABEL',
<A HREF="#$1">$1</A>)
m4_define(`_SECTION_HEADER',
<A NAME="$1"><H2>$1</H2></A>)
Then define all the section headings in a table at the start of the
page body:
m4_define(`_DIFFICULTIES',
`The difficulties of HTML')
m4_define(`_USING_M4', `Using
<EM>m4</EM>')
m4_define(`_SHARING', `Sharing HTML
Elements Across Several Pages')
Then build the table:
<UL><P>
<LI> _LINK_TO_LABEL(_DIFFICULTIES)
<LI> _LINK_TO_LABEL(_USING_M4)
<LI> _LINK_TO_LABEL(_SHARING)
<UL>
Finally, write the text:
...
_SECTION_HEADER(_DIFFICULTIES)
...
The advantages of this approach are twofold. If you change your
headings you only need to change them in one place, and the table of
contents is then automatically regenerated. Also, the links are
guaranteed to work.
Simple To Use TOC
The table of contents generator that I normally use is a bit more
complex and requires a bit more study, but it is much easier to use. It
not only builds the table, but it also automatically numbers the
headings on the fly--up to four levels of numbering (e.g., section
3.2.1.3), although this can be easily extended. It is very simple to
use as follows:
1. Where you want the table to appear, call Start_TOC.
2. At every heading use _H1(`Heading for level 1') or _H2(`Heading for
level 2') as appropriate.
3. After the last line of HTML code (probably </HTML>), call End_TOC.
The code for these macros is shown in [21]Listing 1. One restriction is
that you should not use diversions (i.e., m4-divert) within your text,
unless you preserve the diversion to file 1 used by this TOC generator.
Simple Tables
Other than Tables of Contents, many browsers support tabular
information. Here are some funky macros as a short cut to producing
these tables. First, an example (see Figure 1) of their use:
<CENTER>
_Start_Table(BORDER=5)
_Table_Hdr(,Apples, Oranges, Lemons)
_Table_Row(England, 100,250,300)
_Table_Row(France,200,500,100)
_Table_Row(Germany,500,50,90)
_Table_Row(Spain,,23,2444)
_Table_Row(Danmark,,,20)
_End_Table
</CENTER>
Writing HTML with m4
Figure 1. Example Table
m4 Gotchas
Unfortunately, m4 needs some taming. A little time spent on
familiarisation will pay dividends. Definitive documentation is
available (for example, in the Emacs info documentation system) but,
without being a complete tutorial, here are a few tips based on my
experiences.
Gotcha 1--Quotes
m4's quotation characters are the grave accent ` which starts the
quote, and the acute accent ' which ends it. It may help to put all
arguments to macros in quotes, for example:
_HEAD1(`This is a heading')
The main reason for using quotes is to prevent confusion if commas are
contained in an argument to a macro, since m4 uses commas to separate
macro parameters. For example, the line _CODE(foo, bar) would put the
foo in the HTML output but not the bar. Use quotes in the line
_CODE(`foo, bar'), and it works properly.
Gotcha 2--Word Swallowing
The biggest problem with m4 is that some versions of it swallow key
words that it recognises, such as include, format, divert, file, gnu,
line, regexp, shift, unix, builtin and define. You can protect these
words by putting them in single quotes, for example:
Smart people `include' Linux in their list
of computer essentials.
The trouble is, this is both inconvenient and easy to forget.
A safer way to protect keywords (my preference) is to invoke m4 with
the -P or --prefix-builtins option. Then all built-in macro names are
modified so that they all begin with the prefix m4_ and ordinary words
are left as is. For example, using this option, one would write
m4_define instead of define (as shown in the examples in this article).
One hitch is that not all versions of m4 support this option--most
notably some PC versions under MS-DOS.
Gotcha 3--Comments
Comment lines in m4 begin with the # character--everything from the #
to the end of the line is ignored and output unchanged. If you want to
use # in the HTML page, you must quote it like this: `#'. Another
option (my preference) is to change the m4 comment character to
something exotic with a line like this:
m4_changecom(`[[[[')
and not have to worry about # symbols in your text.
If you want to use comments in the m4 file but not have them appear in
the final HTML file, use the macro m4_dnl (dnl = Delete to New Line).
This macro suppresses everything until the next newline character.
m4_define(_NEWMACRO, `foo bar')
m4_dnl This is a comment
Yet another way to have source code ignored is the m4_divert command.
The main purpose of m4_divert is to save text in a temporary buffer for
inclusion in the file later--for example, in building a table of
contents or index. However, if you divert to "-1", it just goes to
limbo-land. This option is useful for getting rid of the whitespace
generated by the m4_define command. For example:
m4_divert(-1) diversion on
m4_define(this ...)
m4_define(that ...)
m4_divert diversion turned off
Gotcha 4--Debugging
Another tip for when things go wrong is to increase the number of error
diagnostics that m4 outputs. The easiest way to do this is to add the
following to your m4 file as debugging commands:
m4_debugmode(e)
m4_traceon
...
buggy lines
...
m4_traceoff
Conclusion
It should be noted that HTML 3.0 does have an include statement that
looks like this:
<!--#include file="junk.html" -->
However, the HTML include has the following limitations:
* The work of including and interpreting the include is done on the
server-side before downloading and adds overhead as the server has
to scan files for include statements.
* Most servers (especially public ISPs) deactivate this feature
because of the large overhead.
* Include is all you get--no macro substitution, no parameters to
macros, no ifdef, etc., as with m4.
There are several other features of m4 that I have not yet exploited in
my HTML ramblings so far, such as regular expressions. It might be
interesting to create a "standard" stdlib.m4 for general use with nice
macros for general text processing and HTML functions. By all means
download my version of stdlib.m4 as a base for your own hacking. I
would be interested in hearing of useful macros, and if there is enough
interest, maybe a Mini-HOWTO could evolve from this article.
There are many additional advantages to using Linux to develop HTML
pages, far beyond the simple assistance given by the typical typing
aids and WYSIWYG tools. Certainly, I will go on using m4 until HTML
catches up--I will then do my last make and drop back to using pure
HTML. I hope you enjoy these little tricks and encourage you to
contribute your own.
Writing HTML with m4
Bob Hepple has been hacking at Unix since 1981 under a variety of
excuses and has somehow been paid for it at least some of the time.
It's allowed him to pursue another interest--living in warm, exotic
countries including Hong Kong, Australia, Qatar, Saudi Arabia, Lesotho
and (presently) Singapore. His initial aversion to the cold was learned
in the UK. Ambition--to stop working for the credit card company and
tax man and to get a real job. Bob can be reached at
bhepple@pacific.net.sg.
[22]Load Disqus comments
Our discussions are [23]powered by Disqus, which require JavaScript.
Connect With Us
Linux Journal, representing 25+ years of publication, is the original
magazine of the global Open Source community.
© 2024 Slashdot Media, LLC. All rights reserved.
* [24]PRIVACY POLICY
* [25]TERMS OF SERVICE
* [26]ADVERTISE
Footer Menu Column 2
* [27]Masthead
* [28]Authors
* [29]Contact Us
Footer Menu Column 3
* [30]RSS Feeds
* [31]About Us
[noscript-448i7exgpyqpr9c144q.gif]
×
References
Visible links:
1. https://www.linuxjournal.com/article/2393#main-content
2. https://www.linuxjournal.com/
3. https://www.linuxjournal.com/
4. https://www.linuxjournal.com/tag/cloud
5. https://www.linuxjournal.com/tag/containers
6. https://www.linuxjournal.com/tag/desktop
7. https://www.linuxjournal.com/tag/kernel
8. https://www.linuxjournal.com/tag/mobile
9. https://www.linuxjournal.com/tag/networking
10. https://www.linuxjournal.com/tag/privacy
11. https://www.linuxjournal.com/tag/programming
12. https://www.linuxjournal.com/tag/security
13. https://www.linuxjournal.com/tag/servers
14. https://www.linuxjournal.com/tag/sysadmin
15. https://www.linuxjournal.com/news
16. https://www.linuxjournal.com/books
17. https://www.linuxjournal.com/news
18. https://www.linuxjournal.com/popular
19. https://www.linuxjournal.com/recent
20. https://www.linuxjournal.com/tag/howtos
21. https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/023/2393/2393l1.html
22. https://www.linuxjournal.com/article/2393#disqus_thread
23. https://disqus.com/?ref_noscript
24. https://slashdotmedia.com/privacy-statement/
25. https://slashdotmedia.com/terms-of-use/
26. https://www.linuxjournal.com/sponsors
27. https://www.linuxjournal.com/content/masthead
28. https://www.linuxjournal.com/author
29. https://www.linuxjournal.com/form/contact
30. https://www.linuxjournal.com/rss_feeds
31. https://www.linuxjournal.com/aboutus
Hidden links:
33. https://youtube.com/linuxjournalonline
34. https://www.facebook.com/linuxjournal/
35. https://twitter.com/linuxjournal
|