From: "Salle Arobase" <salle.arob...@ville-rochefort.fr>
Newsgroups: comp.os.cpm
Subject: Re: French Luser news
Date: Tue, 19 Aug 2003 16:08:08 +0200
Organization: Ville de Rochefort
Lines: 580
Message-ID: <bhtagr$s89$1@news-reader4.wanadoo.fr>
References: <be3ocf$k4v$1@news-reader1.wanadoo.fr> <bf4dtp$sg8$1@news.hobby.nl> <1058421800snz@nospam.demon.co.uk> <bfbfjm$7kq$1@news-reader1.wanadoo.fr> <1058725031snz@nospam.demon.co.uk> <bfj8n6$3ar$1@news-reader4.wanadoo.fr> <1058904250snz@nospam.demon.co.uk> <bfr17q$jml$1@news-reader5.wanadoo.fr>
Reply-To: "Salle Arobase" <salle.arob...@ville-rochefort.fr>
NNTP-Posting-Host: apoitiers-106-2-3-98.w81-248.abo.wanadoo.fr
X-Trace: news-reader4.wanadoo.fr 1061301595 28937 81.248.43.98 (19 Aug 2003 13:59:55 GMT)
X-Complaints-To: abuse@wanadoo.fr
NNTP-Posting-Date: 19 Aug 2003 13:59:55 GMT
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165

BYTESTAT.TXT  by Emmanuel ROCHE
------------

A solution in search of a problem...

Since last time, I have been  busy working  on the  WS4 to  HTML
converter. I had a lot to learn about the internals  of WS4  and
HTML. Since the main  problem was  how to  display properly  WS4
tables, I was surprised to see  how difficult  it is  with HTML.
There are hundreds of Web pages dealing with this subject alone.

It  seems  that the  origin of  this difficulty  is the  lack of
backward compatibility. Instead of starting from  an ASR-33  TTY
with its 72 columns of (monospaced) characters, "the powers that
are on the Internet" started with the number of pixels displayed
on the screen.  So,  when you  want to  display something  under
HTML, you are obliged to say how many pixels to use...

Of course, since most stuff does not fit on a single "page", you
need to add "elevators" on the right side,  and a  border around
your text, all taking some more  pixels from  the screen.   As a
result, most text I  have read  so far  counsel to  assume that,
instead of 640 pixels wide, the lowest resolution  for a  screen
should be 600 pixels...

Well, this will make for interesting  stuff when  CP/M computers
will have a Browser, since most of them had not 640 pixels. Back
then, in the prehistoric dark ages, we used to think in  term of
the number of characters  displayed... and  most CP/M  computers
used to be able to display  80 characters  (for example,  to use
WordStar), but rarely had 640 pixels (use CP/M on an  Apple IIe,
and you will understand).

One day  that I  was thinking  about this  strange asking  of 64
columns only (even in the 21th Century), it came to  me that  we
usually think  in decimal.   And we  often use  percentages, and
values less  than 100.   For instance,  in France  phone numbers
have 10 numbers, which are written as 01.23.45.67.89 (this could
be a  valid phone  number). You  never pronounce  the "hundred".
Only values less than  100, which  are more  often used.   So, I
asked myself: "Would it be possible to display  100 things  on a
64 columns line?"

I noticed that 64 is slightly  more than  50. The  problem being
that, to display 100 things on  50 columns,  you would  need one
character  displaying  2 symbols...   that already  exist (as  a
character) but one with only one  representation...  Now,  it so
happens that, in French, semicolon is  "deux points"  (two dots)
and, of course, period  is "point  (final)" ((ending)  dot). So,
here I had my 2 characters, one displaying one symbol, the other
displaying two times the same symbol.

(As you  can see,  I think  a lot,  and sometimes  (most of  the
times?) to  things that  are "obvious"  to anybody  else. Me,  I
spend my time asking: "Why... this  or that?"   I seem  to never
have grown up.)

I jumped to my computer, and  produced the  following histogram,
which should be self-explanatory:

run"percent
  0|
  1|.
  2|:
  3|:.
  4|::
  5|::.
  6|:::
  7|:::.
  8|::::
  9|::::.
 10|:::::
Break in 50
Ok

So, we are now able to display percentages on a 64-columns line.
The program which produced the above follows:

list
10 REM PERCENT.BAS  by Emmanuel ROCHE
20 :
30 FOR i = 0 TO 100
40     GOSUB 90
50     IF i MOD 21 = 20 THEN WHILE INKEY$ = "" : WEND
60 NEXT i
70 END
80 :
90 ' Percent
100 PRINT USING "###" ; i ;
110 PRINT "|" ;
120 ' 0 = even
130 ' 1 = odd
140 IF i MOD 2 = 0 THEN PRINT STRING$ (i/2, ":") ELSE PRINT
STRING$ ((i-1)/2, ":") "."
150 RETURN

Now,  it  so  happens  that, while  working on  the WS4  to HTML
converter,  I  was  wondering  which  were  the more  often used
characters in a file? Difficult question,  since characters  are
coded using bytes with 256 values, and most  files usually  hold
thousands of characters... Counting them by hand would  be quite
a chore!

Now, it so happens  that, recently,  I wrote  a "general-purpose
filter program in BASIC". Instead of acting upon  the occurrence
of each of the 256 possible values of a byte, a simple variation
of this program, counting the number of times a  particular byte
value was found inside a file, would solve this problem...

But, there are still some problems on the way. How do you  prove
that such a program works accurately?  Everything in  a computer
(including files and bytes)  are powers  of 2,  and we  think in
decimal (and want the results displayed in percentages!)

The only solution was to create some test  files. I  re-used the
MAKEASCF.BAS program that was mentioned in the FILTER.TXT  file.
To be sure that the file really contains the wanted  values, the
only solution is to inspect it with a DUMP program.

Ok
dir *.bin
 ASCII   .BIN  ZEROES  .BIN  ZEREOF  .BIN  SUITEA  .BIN

Ok
run"dumpfile

DUMPFILE: Enter filename.ext: ? ascii.bin

0000: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F ................
0010: 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F ................
0020: 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F  !"#$%&'()*+,-./
0030: 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 0123456789:;<=>?
0040: 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F @ABCDEFGHIJKLMNO
0050: 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F PQRSTUVWXYZ[\]^_
0060: 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F `abcdefghijklmno
0070: 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F pqrstuvwxyz{|}~.
0080: 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F ................
0090: 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F ................
00A0: A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF ................
00B0: B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF ................
00C0: C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF ................
00D0: D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF ................
00E0: E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF ................
00F0: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF ................

Ok

I know what will complain those people (probably  using a  2-GHz
computers) asking that I use 64 columns:  the above  dump is  71
columns wide...  Well, would it not be time enough to upgrade to
an ASR-33 TTY, you guys?  At least, you could print those  dumps
on  a  30-years  old  Teletype. What  use is  Windows and  2-GHz
computers, if they are not able to display 80 columns of text on
17" screens?

So,  this  was  the  usual  256  values  of  a  byte,  with  its
corresponding  USASCII  characters.  Now,  let us  see what  our
program diplaying the percentages of usage of byte  values in  a
file produces:

run"bytestat

BYTESTAT: Enter filename.typ : ? ascii.bin

Percentages of bytes usage inside file ASCII.BIN.

% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
--+------------------------------------------------
0.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
1.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
2.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
3.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
4.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
5.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
6.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
7.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
8.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
9.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
A.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
B.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
C.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
D.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
E.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
F.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Do you want a histogram? (y/N) n

Ok

Big surprise! We have just seen that there is one  occurrence of
each byte value in the file, yet the program says that each  one
occurs "zero percent"! The reason is  that we  think in  decimal
(or  hundred),  and the  file holds  256 bytes  holding the  256
possible values of a byte. And, 1/256=0.003 percent.   Since the
program only display percentages (it is assuming that  no single
value will occur 100% of the time in all the  bytes, so  it only
uses  2  digits to  display the  percentages) and  the value  is
0.003, it only displays a "00" (which is automatically  shrinked
down to "0").

Well... This seems a reasonable explanation.  Could we  test the
case that must never occurs, when  all the  bytes have  the same
values? Sure, we modify the line of MAKEASCF so that, instead of
writing the value of the loop index into the file,  it writes  a
00h byte  256 times.  One more  DUMP to  be sure  that the  file
really contains them. See below.

Just one remark about the use of "0." and ".0".  This goes  back
to the ANSI text (1967?) defining the ASCII character set.   For
some  unknow  reasons,  it  was displayed  vertically.  I  had a
slight problem (at the  beginning, since  I started  programming
using  EBCDIC  on  IBM  Mainframes.  I  was a  COBOL programmer)
understanding what were the axes of the ASCII table. Since  then
(many, many years ago...), I have  used this  way of  indicating
which are the "high order axis" and the  "low order  axis". This
works nicely  when displaying  only 2-digits  max values.   When
displaying only single digit values (or characters), I hope that
the  reader  will  understand  that  the  table  is   positioned
horizontally.

(The carriage of the printwheel of my TTY runs horizontally, not
vertically. ASCII was standardised based on the TTY, which was a
best-seller, the standard I/O device for more than 20 years.   A
whole generation learned to use computers using one. Screens (or
"glass TTYs") were quite a revolution when they were introduced.
In fact, the first CP/M system had no screen (hence the names of
the "virtual devices" of CP/M 2.2: CON, PUN, RDR, LST,  and NUL,
which was a wheel inside the  TTY generating  a standard  answer
message of 40 characters  (a string  of 00h  if not  set), as  I
explained several times.))

run"dumpfile

DUMPFILE: Enter filename.ext: ? zeroes.bin

0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

Ok
run"bytestat

BYTESTAT: Enter filename.typ : ? zeroes.bin

Percentages of bytes usage inside file ZEROES.BIN.

% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
--+------------------------------------------------
0.| %100  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
1.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
2.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
3.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
4.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
5.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
6.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
7.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
8.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
9.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
A.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
B.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
C.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
D.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
E.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
F.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Do you want a histogram? (y/N) y

00|::::::::::::::::::::::::::::::::::::::::::::::::::

Ok

The program  worked correctly.   It is  not designed  to display
values appearing more than 99%, so  an overflow  occurred, which
produced  the  "%100" displayed  above. All  the other  256 byte
values are not used, so count  for "zero  percent". Finally,  we
takes  this  opportunity to  test the  output of  the histogram.
When  we  move  the  cursor at  the end  of the  line, the  word
processor   indicates   "Column   54".   Since   there   are  50
"semicolons", the 2-digits hex value at left and  a border  (and
the cursor), we are  correct in  getting 54  columns. (We  could
even  have  preceded  the  histogram  with  a  space  or  a  tab
(8+54=63)...)

Now, let us see what happens when testing the contents of a file
filled with only 2 values: 00h and 1Ah (zero and eof):

run"dumpfile

DUMPFILE: Enter filename.ext: ? zereof.bin

0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0080: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
0090: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
00A0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
00B0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
00C0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
00D0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
00E0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
00F0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................

Ok
run"bytestat

BYTESTAT: Enter filename.typ : ? zereof.bin

Percentages of bytes usage inside file ZEREOF.BIN.

% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
--+------------------------------------------------
0.| 50  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
1.|  0  0  0  0  0  0  0  0  0  0 50  0  0  0  0  0
2.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
3.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
4.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
5.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
6.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
7.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
8.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
9.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
A.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
B.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
C.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
D.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
E.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
F.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Do you want a histogram? (y/N) y

00|:::::::::::::::::::::::::
1A|:::::::::::::::::::::::::

Ok

It works. 00h is used 50% of the time. 1Ah  is used  50% of  the
time. The remaining 254 values are used 0% of the time.

Let us finish with a more difficult case. We start  with a  line
of zeroes, then one "1", then two "2", then three "3", etc.  See
the following DUMP to see the internals of the file.

run"dumpfile

DUMPFILE: Enter filename.ext: ? suitea.bin

0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0010: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0020: 02 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0030: 03 03 03 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0040: 04 04 04 04 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050: 05 05 05 05 05 00 00 00 00 00 00 00 00 00 00 00 ................
0060: 06 06 06 06 06 06 00 00 00 00 00 00 00 00 00 00 ................
0070: 07 07 07 07 07 07 07 00 00 00 00 00 00 00 00 00 ................
0080: 08 08 08 08 08 08 08 08 00 00 00 00 00 00 00 00 ................
0090: 09 09 09 09 09 09 09 09 09 00 00 00 00 00 00 00 ................
00A0: 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 00 00 00 00 00 00 ................
00B0: 0B 0B 0B 0B 0B 0B 0B 0B 0B 0B 0B 00 00 00 00 00 ................
00C0: 0C 0C 0C 0C 0C 0C 0C 0C 0C 0C 0C 0C 00 00 00 00 ................
00D0: 0D 0D 0D 0D 0D 0D 0D 0D 0D 0D 0D 0D 0D 00 00 00 ................
00E0: 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 00 00 ................
00F0: 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 00 ................

Ok
run"bytestat

BYTESTAT: Enter filename.typ : ? suitea.bin

Percentages of bytes usage inside file SUITEA.BIN.

% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
--+------------------------------------------------
0.| 53  0  1  1  2  2  2  3  3  4  4  4  5  5  5  6
1.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
2.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
3.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
4.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
5.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
6.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
7.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
8.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
9.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
A.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
B.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
C.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
D.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
E.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
F.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Do you want a histogram? (y/N) y

00|::::::::::::::::::::::::::.
01|
02|.
03|.
04|:
05|:
06|:
07|:.
08|:.
09|::
0A|::
0B|::
0C|::.
0D|::.
0E|::.
0F|:::

Ok

As can be seen (I  hope), the  program produces  correct values.
Now that we have some confidence in the program, let us see what
it displays when computing the percentage of use of bytes inside
WS4 files.

run"bytestat

BYTESTAT: Enter filename.typ : ? printtst.ws4

Percentages of bytes usage inside file PRINTTST.WS4.

% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
--+------------------------------------------------
0.|  0  0  1  0  0  0  0  0  0  0  3  0  0  2  0  0
1.|  0  0  0  0  0  0  0  0  0  0  1  1  3  0  0  0
2.| 21  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0
3.|  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
4.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
5.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
6.|  0  4  1  3  1  4  1  0  2  4  0  0  2  1  3  3
7.|  2  0  3  1  4  1  0  1  0  0  0  0  0  0  0  0
8.|  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0
9.|  0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0
A.|  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
B.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
C.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
D.|  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
E.|  0  0  0  0  1  2  0  0  0  0  0  0  0  0  1  0
F.|  0  0  1  1  1  0  0  0  0  0  0  0  0  0  0  0

Do you want a histogram? (y/N) n

Ok

Big surprise! 21% of usage by 20h (that is to say: "space"). One
char out of five in a WS4 file is a space... We must, for  sure,
treat "spaces" before any other characters when scanning a file!

Another surprise is that none of the uppercase  letters (41h  to
5Ah) make it, despite so much sentences starting with a  A or  a
T. Instead, notice the  percentage of  use of  lowercase letters
(61h to 7Ah), and particularly "a" (4%), "e" (4%), "i"  (4%) and
"t" (4%). This time, we find "a" and "t", but not in the case we
were expecting.

This  above  table  provides  a  fascinating  view  inside   the
internals of WordStar 4. But I won't bore you with the details.

Since  writing  this program  (you will  find a  summary of  the
listing below.  The missing lines  are just  repetitions of  the
enclosing patterns), I have  been wondering  what other  uses it
could be applied to?  If you have any idea, let us know.

list
10 REM BYTESTAT.BAS  by Emmanuel ROCHE
20 :
30 PRINT
40 INPUT "BYTESTAT: Enter filename.typ : " ; file$
50 PRINT
60 nofile$ = FIND$ (file$)
70 IF nofile$ = "" THEN PRINT CHR$ (7) "File not found." : PRINT : END
80 OPTION BASE 0
90 DIM t (&HFF)
100 tot = 0
110 OPEN "R", 1, file$, 1
120 FIELD #1, 1 AS byte$
130 :
140 GET #1
150 IF EOF (1) THEN GOTO 230
160 byte = ASC (byte$)
170 hini = INT (byte / 16)
180 loni = byte - hini * 16
190 GOSUB 510
200 tot = tot + 1
210 GOTO 140  ' Main Loop
220 :
230 PRINT "Percentages of bytes usage inside file " UPPER$ (file$) "."
240 PRINT
250 PRINT "% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F"
260 PRINT "--+------------------------------------------------"
270 FOR i = 0 TO &HF
280     PRINT HEX$ (i) ".|" ;
290     FOR j = 0 TO &HF
300         PRINT " " USING "##" ; (t (i * 16 + j) ) * 100 / tot ;
310     NEXT j
320     PRINT
330 NEXT i
340 PRINT
350 :
360 z$ = "" : PRINT "Do you want a histogram? (y/N) " ;
370 z$ = INPUT$ (1)
380 z$ = UPPER$ (z$)
390 IF z$ <> "Y" THEN PRINT : GOTO 480
400 :
410 PRINT : PRINT
420 FOR k = 0 TO &HFF
430     IF t (k) < 1 THEN GOTO 460
440     PRINT RIGHT$ ("0" + HEX$ (k), 2) "|" ;
450     IF t (k) * 100 / tot MOD 2 = 0  THEN PRINT STRING$
        ( (t (k) * 100 / tot) / 2, ":") ELSE PRINT STRING$
        ( (t (k) * 100 / tot - 1) / 2, ":") "."
460 NEXT k
470 :
480 PRINT
490 END
500 :
510 ' High Nibble:    0    1    2    3    4     5     6     7
520 ON hini+1 GOSUB 600, 680, 760, 840, 920, 1000, 1080, 1160
530 IF hini > 7 THEN hini2 = hini - 7 ELSE RETURN
540 ' High Nibble:    8     9     A     B     C     D     E    F
550 ON hini2 GOSUB 1240, 1320, 1400, 1480, 1560, 1640, 1720, 1800
560 RETURN
570 '
580 ' High Nibble: 0
590 ' Low Nibble:      0     1     2     3     4     5     6     7
600 ON loni+1 GOSUB 1870, 1910, 1950, 1990, 2030, 2070, 2110, 2150
610 IF loni > 7 THEN loni2 = loni - 7 ELSE RETURN
620 ' Low Nibble:     8     9     A     B     C     D     E     F
630 ON loni2 GOSUB 2190, 2230, 2270, 2310, 2350, 2390, 2430, 2470
640 RETURN

1770 '
1780 ' High Nibble: F
1790 ' Low Nibble:       0      1      2      3      4      5      6      7
1800 ON loni+1 GOSUB 11470, 11510, 11550, 11590, 11630, 11670, 11710, 11750
1810 IF loni > 7 THEN loni2 = loni - 7 ELSE RETURN
1820 ' Low Nibble:      8      9      A      B      C      D      E      F
1830 ON loni2 GOSUB 11790, 11830, 11870, 11910, 11950, 11990, 12030, 12070
1840 RETURN
1850 '
1860 ' 00
1870 T (&H0) = T (&H0) + 1
1880 RETURN

12050 '
12060 ' FF
12070 T (&HFF) = T (&HFF) + 1
12080 RETURN
Ok

system

A>That's all, Folks!


Yours Sincerely,
"French Luser"


EOF