<tbody>
<tr>
<td valign="top" id="20hour" class="time">8 pm</td>
<td valign="top" class="noBorderBot 7">
<div align="center">
<div class="movieTag">MOVIE</div>
</div>
<div class="showtime">
<div class="time">8:35 pm</div>
<div class="show">
<a yadda-yadda>Wild Hogs</a>
</div>
</div>
</td>
<td valign="top" class="noBorderBot 11">
<div class="showtime">
<div class="time">8:00 pm</div>
<div class="show">
<a yadda-yadda>The Big Bang Theory</a>
</div>
</div>
<div class="showtime">
<div class="time">8:30 pm</div>
<div class="show">
<a yadda-yadda>The Big Bang Theory</a>
</div>
</div>
</td>
...
</tr>
...
</tbody>
<tbody>
<tr>
<td valign="top" id="20hour" class="time">8 pm</td>
<td valign="top" class="noBorderBot 7">
<div align="center">
<div class="movieTag">MOVIE</div>
</div>
<div class="showtime">
<div class="time">8:35 pm</div>
<div class="show">
<a yadda-yadda>Wild Hogs</a>
</div>
</div>
</td>
<td valign="top" class="noBorderBot 11">
<div class="showtime">
<div class="time">8:00 pm</div>
<div class="show">
<a yadda-yadda>The Big Bang Theory</a>
</div>
</div>
<div class="showtime">
<div class="time">8:30 pm</div>
<div class="show">
<a yadda-yadda>The Big Bang Theory</a>
</div>
</div>
</td>
...
</tr>
...
</tbody>
<tbody>
<tr><td>20:00
<td><div class=mv>20:35<br>Wild Hogs
<td><b>20:00</b><br># Big Bang Theory [2]
...
</tbody>
<tbody>
<tr><td>20:00
<td><div class=mv>20:35<br>Wild Hogs
<td><b>20:00</b><br># Big Bang Theory [2]
...
</tbody>
Noteworthy is the elimination of needless (to me) <div> blocks, the use of depricateddeprecated <b> and </b> bolding, replacement of "The Big..." with "# Big...", and the [2] that indicates two sequential episodes broadcast. Note: 24hr time is used.
<!DOCTYPE html><html>
<head>
<style>
th, td {font-family:Arial;font-size:11pt;}
#tv {border:2px solid black;background:#F0F0F0;}
#tv thead tr th {font-weight:bold;text-align:center;}
#tv thead tr th:first-child {line-height:40px;}
#tv tbody tr td {text-align:left;vertical-align:top;border:1px solid #999999;padding:5px }
#tv tbody tr td:first-child {font-weight:bold;vertical-align:middle;background-color:#ffffff;}
.mv {font-weight:bold;color:red;}
p {font-family:Arial;font-size:11pt;}
</style>
<!--keepChan
+11:21, -HD_ABC,
+02_ABC,
+22_Comedy, -ABCME, -24_News,
+06_Ch7, -PrimeSth, -PrimeNth,
+62_7Two,
+63_7MATE, -7bravo, -7Flix, -RaceTV,
+05_WIN, -HD_WIN, -Sky, -NBN, -SthCross, -NineLife,
+83_Go!,
+82_GEM, -51_Bold, -10Capitals, -52_Peach, -TVSN, -7GTS,
+03_SBS, -HD_SBS, -WorldMovies, -HD_VICELAND, -Food, -SBSWW, -Imparja,
+34_NITV
keepChan-->
</head>
<body>
<table id=tv cellspacing=0>
<thead>
<th>0910
<th>02_ABC
<th>05_WIN
<th>83_Go!
<th>82_GEM
<th>03_SBS
<th>34_NITV
</thead>
<tbody>
<tr><td>13:00
<td>
<td><b>13:15</b><br>Saltimbanco To Luzia - 25 Years<br>Of Cirque Du Soleil In Australia<br>
<b>13:45</b><br>My Way
<td colspan=4>
...
</tbody>
</table>
<!-- Excl
7 # Addams Family
7 # Art Of Ageing
7 # Chase~
...
7 Border Security~
5 Call The Midwife
...
4 Wonders Of Scotland
2 World's Greatest Hotels
6 Woven Threads Stories From Within
7 Young Sheldon
Excl -->
<p>old:247 exp:10 new:10 out:247
</body></html>
<!DOCTYPE html><html>
<head>
<style>
th, td {font-family:Arial;font-size:11pt;}
#tv {border:2px solid black;background:#F0F0F0;}
#tv thead tr th {font-weight:bold;text-align:center;}
#tv thead tr th:first-child {line-height:40px;}
#tv tbody tr td {text-align:left;vertical-align:top;border:1px solid #999999;padding:5px }
#tv tbody tr td:first-child {font-weight:bold;vertical-align:middle;background-color:#ffffff;}
.mv {font-weight:bold;color:red;}
p {font-family:Arial;font-size:11pt;}
</style>
<!--keepChan
+11:21, -HD_ABC,
+02_ABC,
+22_Comedy, -ABCME, -24_News,
+06_Ch7, -PrimeSth, -PrimeNth,
+62_7Two,
+63_7MATE, -7bravo, -7Flix, -RaceTV,
+05_WIN, -HD_WIN, -Sky, -NBN, -SthCross, -NineLife,
+83_Go!,
+82_GEM, -51_Bold, -10Capitals, -52_Peach, -TVSN, -7GTS,
+03_SBS, -HD_SBS, -WorldMovies, -HD_VICELAND, -Food, -SBSWW, -Imparja,
+34_NITV
keepChan-->
</head>
<body>
<table id=tv cellspacing=0>
<thead>
<th>0910
<th>02_ABC
<th>05_WIN
<th>83_Go!
<th>82_GEM
<th>03_SBS
<th>34_NITV
</thead>
<tbody>
<tr><td>13:00
<td>
<td><b>13:15</b><br>Saltimbanco To Luzia - 25 Years<br>Of Cirque Du Soleil In Australia<br>
<b>13:45</b><br>My Way
<td colspan=4>
...
</tbody>
</table>
<!-- Excl
7 # Addams Family
7 # Art Of Ageing
7 # Chase~
...
7 Border Security~
5 Call The Midwife
...
4 Wonders Of Scotland
2 World's Greatest Hotels
6 Woven Threads Stories From Within
7 Young Sheldon
Excl -->
<p>old:247 exp:10 new:10 out:247
</body></html>
- The single (8-10Kb) file contains an HTML document.
- It begins with some styling macrosa stylesheet for formatting.
- An HTML comment block identifies the 37 channels indicating with '+/-' which channels (table columns) are to be retained and which are to be simply skipped over.
- Everything to the
</head>is replicated from one day to the next. - The table's body contents should be apparent.
- Following the table is the "exclude title" list, each entry prefixed by the remaining days until the entry is to be "forgotten" and removed from the list.
- Finally are some stats about how many exclusions were loaded, how many expired, how many added and the new tally.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <io.h>
typedef struct sExcl { // Exclude list/tree data
int days; // days until expire from list
unsigned int hash; // first 4 chars as integer for fast compare
char *ttl; // title as string
sExcl *lft; // list transforms to binary tree
sExcl *rgt;
} excl_t;
typedef struct sPR { // One program in the listings
sPR *nxt; // ptr to next program (in same hour timeslot)
excl_t cr; // filled when program added to exclude tree
int mov; // is a movie?
int start; // start time (hhdd as integer)
int cnt; // count (to combine sequential episodes into single listing)
} PR_t;
typedef struct sTD { // One table cell loaded/stored
sTD *nxt; // ptr to next in row
PR_t *sh1; // ptr to 1st program in this cell (timeslot)
} TD_t;
typedef struct sTR { // One table row loaded/stored
sTR *nxt; // ptr to next row
int hr; // 0-23 hour of this time block
TD_t *TD0; // ptr to first cell in this row
} TR_t;
char *allocBuf; // for re-use of preamble bytes in single heap allocation
char *ch[40]; // ptrs to channel identifiers (strings)
struct { // Exclude list/tree information
char *Bgn; // HTML comment tag marking begin of exclusion list
char *End; // tag marking end of exclusion list
int Dur; // max duration (days) until expire from list
int Old, Exp, New, Out; // stats (counts loaded, expired, added, saved)
excl_t *Tree; // ptr to titles after transformation to tree
} excl = { "<!-- Excl", "Excl -->", 7, };
// simple calculation of yesterday (MMDD) from today (MM DD) w/o effort for leap year
int yesterday( int mo, int dy ) {
return (!--dy) ? (("xLABCDEFGHIJK"[mo]-'@')*100) + (28+"x3303232332323"[mo]-'0') : (mo*100)+dy;
}
#define GrabFour( s ) ( ((s[0] << 8 | s[1]) << 8 | s[2]) << 8 | s[3] ) // simple hash 4 chars to int
#define twoDgt( t, u ) ((t - '0')*10 + (u - '0')) // simple two digits to int
// strcmp() but allow '~' on 1st param to act as wild card matching any extra on 2nd param
// can't use '*' because "M*A*S*H" is a valid (classic) program title often in re-runs
int cmpWild( char *kwn, char *nu ) {
while( *kwn && *kwn == *nu )
kwn++, nu++;
return *kwn == '~' ? 0 : *nu - *kwn; // equal (0) OR calc >/=/<
}
// traverse excluded title tree looking for this program title
// if found, restore full expire lifespan (7 days), return true.
// if not found, but contains a permanent "stop word", return true.
// else return false
bool exclfnc( PR_t *p ) {
excl_t *kwn = excl.Tree;
while( kwn ) {
int res = p->cr.hash - kwn->hash;
if( !res ) res = cmpWild( kwn->ttl, p->cr.ttl );
if( !res ) {
if( kwn->days < 0 ) return false;
kwn->days = excl.Dur;
return true;
}
kwn = ( res < 0 ) ? kwn->lft : kwn->rgt;
}
static char *no[] = { "Fishing", "Pickers", "Nazi", "Hitler", NULL };
for( int i = 0; no[i]; i++ ) if( strstr( p->cr.ttl, no[i] ) ) return true;
return false;
}
// local allocation of bytes of 100K+ "preamble" already loaded into heap
// (entire program uses only one true malloc at beginning)
void *alloc( size_t sz ) {
sz = (sz + (sizeof(void*) - 1)) & ~0x03; // Round to nxt pointer size
memset( allocBuf, '\0', sz );
allocBuf += sz;
return allocBuf - sz;
}
// transform array of pointers (exclusion titles as list) to binary tree
excl_t *buildTree( size_t l, size_t r, excl_t lst[] ) {
size_t m = (l + r)/2;
excl_t *nu = &lst[ m ];
if( l < m ) nu->lft = buildTree( l, m-1, lst );
if( m < r ) nu->rgt = buildTree( m+1, r, lst );
return nu;
}
// brand new program titles added to existing tree (self sorting for output)
void insertTree( excl_t *nu ) {
for( excl_t *p = excl.Tree, **pp; ; p = *pp ) {
int res = nu->hash - p->hash;
if( !res ) res = strcmp( nu->ttl, p->ttl );
if( !res ) return;
pp = ( res < 0 ) ? &p->lft : &p->rgt;
if( !*pp ) { (*pp = nu)->days = excl.Dur + 1; return; } // extra day of life for now
}
}
// set up array (list) of ptrs to "exclude titles" section of my HTML file
// each lifespan & title, and hash value stored individually
// notice local allocation (re-use) of preamble section of loaded buffer
// finally, transform that list into tree for searching and additions
excl_t *procExcl( char *p ) {
p = strchr( strstr( p, excl.Bgn ), '\n' ) + 1;
char *cp = strstr( p, excl.End );
cp[0] = '\0';
while( --cp > p ) excl.Old += *cp == '\n'; // count titles
// titles to pointer array
excl_t *List = (excl_t*)alloc( excl.Old * sizeof *List ), *pl = List;
for( cp = p; ( cp = strtok( cp, "\n") ) != NULL; cp = NULL, pl++ )
pl->days = cp[0] - '1', pl->ttl = cp + 2, pl->hash = GrabFour( pl->ttl );
return buildTree( 0, excl.Old - 1, List );
}
// my HTML file contains region of comma separated channel names
// each name corresponds to one column of today's program listings (each <TD>)
// names (ie: "channels") beginning with '+' are retained
// names beginning with '-' are simply skipped over (ignored)
// raw data has 37(?) columns
// all 37 pointers are significant here, including NULL pointers; compacting happens later.
// mark the end of my "permanent" header HTML for later duplication to output file.
char *procChan( char *bp ) {
bp = strstr( bp, "<!--keepChan" ) + 12; // beyond token
char *ep = strstr( bp, "keepChan-->" );
char *cp = (char*)alloc( (ep-bp) + 1 ); // copy to chop up and manipulate
memcpy( cp, bp, (ep-bp) );
// chop up copy selectively filling ch[]
for( int i = 0; ( cp = strtok( cp, ", \t\n") ) != NULL; i++, cp = NULL )
if( *cp == '+' ) ch[i] = cp; // ch[] null ptrs used to skip don't care channels
ep = strstr( ep, "</head>" ) + 7; // Having loaded yesterday, isolate its HTML header for re-use
*ep++ = '\0';
return ep; // exclusions follow
}
// digest today's program listings from web's HTML assembling internal tree (rows & cols) of listings
// many things happen here!!
// - my HTML (for col 0) is the start & stop hours of interest. web page hours earlier/later are ignored.
// - advance the buffer pointer to <tbody> marker, then...
// - get the row's hour from web provided '<TR id=13>' (indicating 1PM hour block)
// - keep scanning to ignore downloaded listings earlier than interest.
// - note: as single pass proceeds, terminate on </tbody> OR listing too late to be of interest.
// looping sensitive to "<tr>", "<td>" and "<div...>" HTML tokens
// - simplest ("<tr>") simply starts (allocates) a new row.
// - "<td>" is selective. Only retained columns (non NULL channels) are preserved; others are simply skipped over.
// also, "daytime children's programs" on one particular channel are skipped over.
// these "cells" are attached to the current row's LL of columns
// - "<div...>" delimiters are used for several layout and data purposes:
// "align" is ignored
// "class=.." herald either "SPORT" or "MOVIE". I choose to drop sport programs, but retain movie titles
// otherwise, next field is program start time as "HH:MM", so ':' is a search target to extract minutes.
// web format is then a lengthy anchor ("<a...>") that is skipped, followed by the program title.
//
// - Program title text:
// Lots of titles of "News" are ignored (I know when the 6 o'clock news is broadcast)
// All movie titles are retained, but only "show" titles NOT found in exclusion list.
// Too often, 2+ sequential episodes of same series are broadcast. Sequential same titles are compressed.
// If program title retained, its info is attached to cell's LL of programs (possibly multiple titles in cell).
TR_t *digest( char *sp ) {
TR_t *TR0 = NULL, *TRn, *TRprv = NULL;
TD_t *TDn;
PR_t *pTail;
PR_t *prev[40] = { 0 };
int cCnt = 0, hr, earliest = atoi( ch[0] ), latest = atoi( ch[0] + 4 ); // Earliest to retain
sp = strstr( sp, "<tbody" );
//printf( "%.190s\n", sp ); getchar();
do { sp = strstr( ++sp, "<tr>" ); } while( ( hr = atoi( strstr( sp, "id=\"" ) + 4 ) ) < earliest );
for( ; strncmp( sp, "</tbody", 7 ); sp = strchr( ++sp, '<' ) ) {
if( sp[1] == 'd' ) { // "<div"
//printf( "DIV ... %.90s\n", sp ); getchar();
if( cmpWild( "align~", sp + 5 ) == 0 ) {}
else if( cmpWild( "class=\"sport~", sp + 5 ) == 0 ) sp = strstr( sp + 290, "</a>" );
else {
//printf( "Now %.90s\n", sp ); getchar();
PR_t buf = { 0 };
if( sp[12] == 'm' ) buf.mov = 1;
sp = strchr( sp, ':' );
buf.start = TRn->hr * 100 + twoDgt( sp[1], sp[2] );
buf.cr.ttl = sp = strchr( strstr( sp, "<a " ) + 50, '>' ) + 1; // end of anchor
sp = strchr( sp, '<' );
*sp = '\0';
//printf( "@%4d Showtitle: %s\n", buf.start, buf.cr.ttl ); getchar();
if( ( buf.cr.hash = GrabFour( buf.cr.ttl ) ) == (unsigned int)GrabFour( "The " ) )
buf.cr.ttl += 2, buf.cr.ttl[0] = '#', buf.cr.hash = GrabFour( buf.cr.ttl );
if( strstr( buf.cr.ttl, " News" ) == 0 && ( buf.mov || !exclfnc( &buf ) ) ) {
if( prev[cCnt] && strcmp( buf.cr.ttl, prev[cCnt]->cr.ttl ) == 0 )
prev[cCnt]->cnt++;
else {
PR_t *p = (PR_t*)alloc( sizeof *p );
if( !TDn->sh1 ) pTail = TDn->sh1 = p; else pTail = pTail->nxt = p;
memcpy( p, &buf, sizeof *p );
prev[cCnt]= p;
}
}
}
} else if( sp[1] == 't' ) { // "<td" or "<tr"
if( sp[2] == 'd' ) { // "<td"
//printf( "TD... %.90s\n", sp ); getchar();
if( ch[cCnt] ) {
TD_t *p = (TD_t *)alloc( sizeof *p );
if( !TRn->TD0 ) TDn = TRn->TD0 = p; else TDn = TDn->nxt = p;
}
if( cCnt == 0 || ch[cCnt] == NULL || ( cCnt == 3 && TRn->hr < 19 ) )
sp = strstr( sp, "</td" ); // speed to end of cell.
cCnt++;
} else { // "<tr"
//printf( "TR ... %.90s\n", sp ); getchar();
if( hr > latest ) break;
TR_t *p = (TR_t *)alloc( sizeof *p );
if( !TR0 ) TRn = TR0 = p; else TRn = TRn->nxt = p;
p->hr = hr++;
cCnt = 0;
}
}
}
// Trim the fat - Condense to channels used
// Now that NULL ptrs have been used to skip web columns, condense array of not NULL ptrs
for( int i = 1, j = 1; i < sizeof ch/sizeof ch[0]; i++ )
if( ch[i] ) // Change prefix '+' to '-' for later use
ch[i][0] = '-', ch[j++] = ch[i], ch[i] = NULL;
// Go through all cols of all rows eliminating rows that have no programs retained during that hour
for( TRn = TR0; TRn; TRn = TRn->nxt ) {
int hrUsed = 0;
for( i = 1, TDn = TRn->TD0->nxt; TDn; TDn = TDn->nxt, i++ )
if( TDn->sh1 )
hrUsed++, ch[i][0] = '+'; // back to '+' cuz column has 1+ titles
if( hrUsed ) // titles retained during this hour?
TRprv = TRn; // remember this row, and go on to nxt
else if( TRn == TR0 ) // is this the 1st row??
TR0 = TR0->nxt; // discard it
else
TRprv->nxt = TRn->nxt; // abandon this row
}
// now have active rows with active cells (columns) in a tree
return TR0; // pointer to 1st active row
}
/* HTML OUTPUT functions that write "my" file */
// wrap one long program title by inserting <br>
// uses single static buffer just before it is output
char *wrap( char *p ) {
int i, m = strlen( p ) / 2;
static char rVal[100];
if( m > 12 )
for( int l = m-1, r = m; l > 0; l--, r++ )
if( p[i = r] == ' ' || p[i = l] == ' ' ) {
sprintf( rVal, "%.*s<br>%s", i, p, p + i + 1 );
return rVal;
}
return p;
}
// "in order" recursive traversal of excluded title tree to output lines as sorted list
void pubExcl( excl_t *p ) {
if( p->lft ) pubExcl( p->lft );
if( p->days ) {
char sep = '\t';
if( p->days > excl.Dur ) p->days = excl.Dur, sep = ' ', excl.New++;
if( p->days < 0 ) p->days = 0;
printf( "%d%c%s\n", p->days, sep, p->ttl );
excl.Out++;
} else excl.Exp++;
if( p->rgt ) pubExcl( p->rgt );
}
// fancy-pants use of HTML colspan to unclutter output table
void pubSpans( int n, char *sufx ) {
if( n ) printf( n == 1 ? "\n\t<td> " : "\n\t<td colspan=%d> ", n );
printf( "%s", sufx );
}
// publish one cell's program titles (a movie?) with start time and title
// if not a movie, publishing means add to list of exclusions for tomorrow. Publish ONCE!
void pubCell( PR_t *p ) {
while( p ) {
char *fmt = p->mov ? "<div class=mv>%02d:%02d<br>%s" : "<b>%02d:%02d</b><br>%s";
printf( fmt, p->start/100, p->start%100, wrap( p->cr.ttl ) );
if( p->cnt ) printf( " [%d]", p->cnt + 1 );
if( !p->mov && p->cr.ttl[2] ) insertTree( &p->cr );
if( ( p = p->nxt ) != NULL ) printf( "<br>\n\t\t" );
}
}
// iteratively publish all the cells (channels) of one row
// if next cell has no titles, then use fancy-pants horizontal spanning
void pubTrow( TD_t *cb ) {
int i = 1, nSpan = 0;
for( ; cb; cb = cb->nxt, i++ )
if( cb->sh1 )
pubSpans( nSpan, "\n\t<td>" ), nSpan = 0, pubCell( cb->sh1 );
else nSpan += ( ch[i][0] == '+' ); // Only if column 'active'
pubSpans( nSpan, "\n" );
}
// iteratively publish all the rows (active hours) of the tree
void pubTbody( TR_t *hb ) {
for( ; hb; hb = hb->nxt ) {
printf( "\n<tr><td>%02d:00", hb->hr );
pubTrow( hb->TD0->nxt );
}
}
// publish the table headers (only channels that have a new program or a movie today)
void pubThead( void ) {
for( int x = 0; ch[x]; x++ )
if( ch[x][0] == '+' )
printf( "\t<th>%s\n", ch[x] + 1 ); // without "+"
}
// publish today's entire distilled listings
// start a new file
// output my HTML preamble that was loaded from yesterday's file
// output table preamble, today's listings, and exclusion list to use tomorrow
void publish( TR_t *hbs, char *yday, int mmdd, char *fName ) {
int saved = _dup( fileno( stdout ) );
sprintf( fName, "tvg %04d.html", mmdd );
freopen( fName, "wt", stdout );
sprintf( ch[0], "+%04d", mmdd );
printf( "%s\n<body>\n", yday ); // yesterday's "<head>" into today's version
puts( "<table id=tv cellspacing=0>" );
puts( "<thead>" ); pubThead( ); puts( "</thead>" );
puts( "<tbody>" ); pubTbody(hbs); puts( "</tbody>" );
puts( "</table>" );
puts( excl.Bgn ); pubExcl( excl.Tree ); puts( excl.End );
printf( "<p>old:%d exp:%d new:%d out:%d\n", excl.Old, excl.Exp, excl.New, excl.Out );
puts( "</body></html>" );
fclose( stdout );
_dup2( saved, 1 );
}
// measure, allocate heap and load web HTML file
// OR
// measure, allocate chunk of web loaded heap for local use to load MY HTML from yesterday
char *load( char *name, void*(fnc)(size_t) ) {
FILE *fp = fopen( name, "rt" );
if( fp == NULL ) { puts("open bad"); getchar();}
struct stat inf;
fstat( fileno(fp), &inf );
char *p = (char*)fnc( inf.st_size );
fread( p, sizeof *p, inf.st_size, fp );
fclose( fp );
return p;
}
// - access and load web HTML file (300-400Kb)
// - find marker and get the file's listing's date (about 8000 bytes in)
// - calculate and load yesterday's version of my HTML
// - process the exclusion list of program titles
// - digest today's web listings into my tree, then publish those severely reduced listings
// - launch my editor on that 8Kb version (that has a one click "browser" facility)
void main( void ) {
char fName[30] = "./tvg.html";
char *Buffer, *p = allocBuf = Buffer = load( fName, malloc );
if( ( p = strstr( p, "var guideDate = \"" ) ) == NULL ) exit(1);
int mm = twoDgt( p[19], p[20] ), dd = twoDgt( p[17], p[18] );
sprintf( fName, "tvg %04d.html", yesterday( mm, dd ) );
char *yDayBuf = load( fName, alloc );
excl.Tree = procExcl( procChan( yDayBuf ) ); // used channels and excluded titles
publish( digest( Buffer + 89000 ), yDayBuf, mm*100+dd, fName ); // TABLESTART offset
sprintf( Buffer, "start \"C:/Program Files (x86)/EditPlus 2/Editplus.exe\" \"./%s\"", fName );
system( Buffer );
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <io.h>
typedef struct sExcl { // Exclude list/tree data
int days; // days until expire from list
unsigned int hash; // first 4 chars as integer for fast compare
char *ttl; // title as string
sExcl *lft; // list transforms to binary tree
sExcl *rgt;
} excl_t;
typedef struct sPR { // One program in the listings
sPR *nxt; // ptr to next program (in same hour timeslot)
excl_t cr; // filled when program added to exclude tree
int mov; // is a movie?
int start; // start time (hhdd as integer)
int cnt; // count (to combine sequential episodes into single listing)
} PR_t;
typedef struct sTD { // One table cell loaded/stored
sTD *nxt; // ptr to next in row
PR_t *sh1; // ptr to 1st program in this cell (timeslot)
} TD_t;
typedef struct sTR { // One table row loaded/stored
sTR *nxt; // ptr to next row
int hr; // 0-23 hour of this time block
TD_t *TD0; // ptr to first cell in this row
} TR_t;
char *allocBuf; // for re-use of preamble bytes in single heap allocation
char *ch[40]; // ptrs to channel identifiers (strings)
struct { // Exclude list/tree information
char *Bgn; // HTML comment tag marking begin of exclusion list
char *End; // tag marking end of exclusion list
int Dur; // max duration (days) until expire from list
int Old, Exp, New, Out; // stats (counts loaded, expired, added, saved)
excl_t *Tree; // ptr to titles after transformation to tree
} excl = { "<!-- Excl", "Excl -->", 7, };
// simple calculation of yesterday (MMDD) from today (MM DD) w/o effort for leap year
int yesterday( int mo, int dy ) {
return (!--dy) ? (("xLABCDEFGHIJK"[mo]-'@')*100) + (28+"x3303232332323"[mo]-'0') : (mo*100)+dy;
}
#define GrabFour( s ) ( ((s[0] << 8 | s[1]) << 8 | s[2]) << 8 | s[3] ) // simple hash 4 chars to int
#define twoDgt( t, u ) ((t - '0')*10 + (u - '0')) // simple two digits to int
// strcmp() but allow '~' on 1st param to act as wild card matching any extra on 2nd param
// can't use '*' because "M*A*S*H" is a valid (classic) program title often in re-runs
int cmpWild( char *kwn, char *nu ) {
while( *kwn && *kwn == *nu )
kwn++, nu++;
return *kwn == '~' ? 0 : *nu - *kwn; // equal (0) OR calc >/=/<
}
// traverse excluded title tree looking for this program title
// if found, restore full expire lifespan (7 days), return true.
// if not found, but contains a permanent "stop word", return true.
// else return false
bool exclfnc( PR_t *p ) {
excl_t *kwn = excl.Tree;
while( kwn ) {
int res = p->cr.hash - kwn->hash;
if( !res ) res = cmpWild( kwn->ttl, p->cr.ttl );
if( !res ) {
if( kwn->days < 0 ) return false;
kwn->days = excl.Dur;
return true;
}
kwn = ( res < 0 ) ? kwn->lft : kwn->rgt;
}
static char *no[] = { "Fishing", "Pickers", "Nazi", "Hitler", NULL };
for( int i = 0; no[i]; i++ ) if( strstr( p->cr.ttl, no[i] ) ) return true;
return false;
}
// local allocation of bytes of 100K+ "preamble" already loaded into heap
// (entire program uses only one true malloc at beginning)
void *alloc( size_t sz ) {
sz = (sz + (sizeof(void*) - 1)) & ~0x03; // Round to nxt pointer size
memset( allocBuf, '\0', sz );
allocBuf += sz;
return allocBuf - sz;
}
// transform array of pointers (exclusion titles as list) to binary tree
excl_t *buildTree( size_t l, size_t r, excl_t lst[] ) {
size_t m = (l + r)/2;
excl_t *nu = &lst[ m ];
if( l < m ) nu->lft = buildTree( l, m-1, lst );
if( m < r ) nu->rgt = buildTree( m+1, r, lst );
return nu;
}
// brand new program titles added to existing tree (self sorting for output)
void insertTree( excl_t *nu ) {
for( excl_t *p = excl.Tree, **pp; ; p = *pp ) {
int res = nu->hash - p->hash;
if( !res ) res = strcmp( nu->ttl, p->ttl );
if( !res ) return;
pp = ( res < 0 ) ? &p->lft : &p->rgt;
if( !*pp ) { (*pp = nu)->days = excl.Dur + 1; return; } // extra day of life for now
}
}
// set up array (list) of ptrs to "exclude titles" section of my HTML file
// each lifespan & title, and hash value stored individually
// notice local allocation (re-use) of preamble section of loaded buffer
// finally, transform that list into tree for searching and additions
excl_t *procExcl( char *p ) {
p = strchr( strstr( p, excl.Bgn ), '\n' ) + 1;
char *cp = strstr( p, excl.End );
cp[0] = '\0';
while( --cp > p ) excl.Old += *cp == '\n'; // count titles
// titles to pointer array
excl_t *List = (excl_t*)alloc( excl.Old * sizeof *List ), *pl = List;
for( cp = p; ( cp = strtok( cp, "\n") ) != NULL; cp = NULL, pl++ )
pl->days = cp[0] - '1', pl->ttl = cp + 2, pl->hash = GrabFour( pl->ttl );
return buildTree( 0, excl.Old - 1, List );
}
// my HTML file contains region of comma separated channel names
// each name corresponds to one column of today's program listings (each <TD>)
// names (ie: "channels") beginning with '+' are retained
// names beginning with '-' are simply skipped over (ignored)
// raw data has 37(?) columns
// all 37 pointers are significant here, including NULL pointers; compacting happens later.
// mark the end of my "permanent" header HTML for later duplication to output file.
char *procChan( char *bp ) {
bp = strstr( bp, "<!--keepChan" ) + 12; // beyond token
char *ep = strstr( bp, "keepChan-->" );
char *cp = (char*)alloc( (ep-bp) + 1 ); // copy to chop up and manipulate
memcpy( cp, bp, (ep-bp) );
// chop up copy selectively filling ch[]
for( int i = 0; ( cp = strtok( cp, ", \t\n") ) != NULL; i++, cp = NULL )
if( *cp == '+' ) ch[i] = cp; // ch[] null ptrs used to skip don't care channels
ep = strstr( ep, "</head>" ) + 7; // Having loaded yesterday, isolate its HTML header for re-use
*ep++ = '\0';
return ep; // exclusions follow
}
// digest today's program listings from web's HTML assembling internal tree (rows & cols) of listings
// many things happen here!!
// - my HTML (for col 0) is the start & stop hours of interest. web page hours earlier/later are ignored.
// - advance the buffer pointer to <tbody> marker, then...
// - get the row's hour from web provided '<TR id=13>' (indicating 1PM hour block)
// - keep scanning to ignore downloaded listings earlier than interest.
// - note: as single pass proceeds, terminate on </tbody> OR listing too late to be of interest.
// looping sensitive to "<tr>", "<td>" and "<div...>" HTML tokens
// - simplest ("<tr>") simply starts (allocates) a new row.
// - "<td>" is selective. Only retained columns (non NULL channels) are preserved; others are simply skipped over.
// also, "daytime children's programs" on one particular channel are skipped over.
// these "cells" are attached to the current row's LL of columns
// - "<div...>" delimiters are used for several layout and data purposes:
// "align" is ignored
// "class=.." herald either "SPORT" or "MOVIE". I choose to drop sport programs, but retain movie titles
// otherwise, next field is program start time as "HH:MM", so ':' is a search target to extract minutes.
// web format is then a lengthy anchor ("<a...>") that is skipped, followed by the program title.
//
// - Program title text:
// Lots of titles of "News" are ignored (I know when the 6 o'clock news is broadcast)
// All movie titles are retained, but only "show" titles NOT found in exclusion list.
// Too often, 2+ sequential episodes of same series are broadcast. Sequential same titles are compressed.
// If program title retained, its info is attached to cell's LL of programs (possibly multiple titles in cell).
TR_t *digest( char *sp ) {
TR_t *TR0 = NULL, *TRn, *TRprv = NULL;
TD_t *TDn;
PR_t *pTail;
PR_t *prev[40] = { 0 };
int cCnt = 0, hr, earliest = atoi( ch[0] ), latest = atoi( ch[0] + 4 ); // Earliest to retain
sp = strstr( sp, "<tbody" );
//printf( "%.190s\n", sp ); getchar();
do { sp = strstr( ++sp, "<tr>" ); } while( ( hr = atoi( strstr( sp, "id=\"" ) + 4 ) ) < earliest );
for( ; strncmp( sp, "</tbody", 7 ); sp = strchr( ++sp, '<' ) ) {
if( sp[1] == 'd' ) { // "<div"
//printf( "DIV ... %.90s\n", sp ); getchar();
if( cmpWild( "align~", sp + 5 ) == 0 ) {}
else if( cmpWild( "class=\"sport~", sp + 5 ) == 0 ) sp = strstr( sp + 290, "</a>" );
else {
//printf( "Now %.90s\n", sp ); getchar();
PR_t buf = { 0 };
if( sp[12] == 'm' ) buf.mov = 1;
sp = strchr( sp, ':' );
buf.start = TRn->hr * 100 + twoDgt( sp[1], sp[2] );
buf.cr.ttl = sp = strchr( strstr( sp, "<a " ) + 50, '>' ) + 1; // end of anchor
sp = strchr( sp, '<' );
*sp = '\0';
//printf( "@%4d Showtitle: %s\n", buf.start, buf.cr.ttl ); getchar();
if( ( buf.cr.hash = GrabFour( buf.cr.ttl ) ) == (unsigned int)GrabFour( "The " ) )
buf.cr.ttl += 2, buf.cr.ttl[0] = '#', buf.cr.hash = GrabFour( buf.cr.ttl );
if( strstr( buf.cr.ttl, " News" ) == 0 && ( buf.mov || !exclfnc( &buf ) ) ) {
if( prev[cCnt] && strcmp( buf.cr.ttl, prev[cCnt]->cr.ttl ) == 0 )
prev[cCnt]->cnt++;
else {
PR_t *p = (PR_t*)alloc( sizeof *p );
if( !TDn->sh1 ) pTail = TDn->sh1 = p; else pTail = pTail->nxt = p;
memcpy( p, &buf, sizeof *p );
prev[cCnt]= p;
}
}
}
} else if( sp[1] == 't' ) { // "<td" or "<tr"
if( sp[2] == 'd' ) { // "<td"
//printf( "TD... %.90s\n", sp ); getchar();
if( ch[cCnt] ) {
TD_t *p = (TD_t *)alloc( sizeof *p );
if( !TRn->TD0 ) TDn = TRn->TD0 = p; else TDn = TDn->nxt = p;
}
if( cCnt == 0 || ch[cCnt] == NULL || ( cCnt == 3 && TRn->hr < 19 ) )
sp = strstr( sp, "</td" ); // speed to end of cell.
cCnt++;
} else { // "<tr"
//printf( "TR ... %.90s\n", sp ); getchar();
if( hr > latest ) break;
TR_t *p = (TR_t *)alloc( sizeof *p );
if( !TR0 ) TRn = TR0 = p; else TRn = TRn->nxt = p;
p->hr = hr++;
cCnt = 0;
}
}
}
// Trim the fat - Condense to channels used
// Now that NULL ptrs have been used to skip web columns, condense array of not NULL ptrs
for( int i = 1, j = 1; i < sizeof ch/sizeof ch[0]; i++ )
if( ch[i] ) // Change prefix '+' to '-' for later use
ch[i][0] = '-', ch[j++] = ch[i], ch[i] = NULL;
// Go through all cols of all rows eliminating rows that have no programs retained during that hour
for( TRn = TR0; TRn; TRn = TRn->nxt ) {
int hrUsed = 0;
for( i = 1, TDn = TRn->TD0->nxt; TDn; TDn = TDn->nxt, i++ )
if( TDn->sh1 )
hrUsed++, ch[i][0] = '+'; // back to '+' cuz column has 1+ titles
if( hrUsed ) // titles retained during this hour?
TRprv = TRn; // remember this row, and go on to nxt
else if( TRn == TR0 ) // is this the 1st row??
TR0 = TR0->nxt; // discard it
else
TRprv->nxt = TRn->nxt; // abandon this row
}
// now have active rows with active cells (columns) in a tree
return TR0; // pointer to 1st active row
}
/* HTML OUTPUT functions that write "my" file */
// wrap one long program title by inserting <br>
// uses single static buffer just before it is output
char *wrap( char *p ) {
int i, m = strlen( p ) / 2;
static char rVal[100];
if( m > 12 )
for( int l = m-1, r = m; l > 0; l--, r++ )
if( p[i = r] == ' ' || p[i = l] == ' ' ) {
sprintf( rVal, "%.*s<br>%s", i, p, p + i + 1 );
return rVal;
}
return p;
}
// "in order" recursive traversal of excluded title tree to output lines as sorted list
void pubExcl( excl_t *p ) {
if( p->lft ) pubExcl( p->lft );
if( p->days ) {
char sep = '\t';
if( p->days > excl.Dur ) p->days = excl.Dur, sep = ' ', excl.New++;
if( p->days < 0 ) p->days = 0;
printf( "%d%c%s\n", p->days, sep, p->ttl );
excl.Out++;
} else excl.Exp++;
if( p->rgt ) pubExcl( p->rgt );
}
// fancy-pants use of HTML colspan to unclutter output table
void pubSpans( int n, char *sufx ) {
if( n ) printf( n == 1 ? "\n\t<td> " : "\n\t<td colspan=%d> ", n );
printf( "%s", sufx );
}
// publish one cell's program titles (a movie?) with start time and title
// if not a movie, publishing means add to list of exclusions for tomorrow. Publish ONCE!
void pubCell( PR_t *p ) {
while( p ) {
char *fmt = p->mov ? "<div class=mv>%02d:%02d<br>%s" : "<b>%02d:%02d</b><br>%s";
printf( fmt, p->start/100, p->start%100, wrap( p->cr.ttl ) );
if( p->cnt ) printf( " [%d]", p->cnt + 1 );
if( !p->mov && p->cr.ttl[2] ) insertTree( &p->cr );
if( ( p = p->nxt ) != NULL ) printf( "<br>\n\t\t" );
}
}
// iteratively publish all the cells (channels) of one row
// if next cell has no titles, then use fancy-pants horizontal spanning
void pubTrow( TD_t *cb ) {
int i = 1, nSpan = 0;
for( ; cb; cb = cb->nxt, i++ )
if( cb->sh1 )
pubSpans( nSpan, "\n\t<td>" ), nSpan = 0, pubCell( cb->sh1 );
else nSpan += ( ch[i][0] == '+' ); // Only if column 'active'
pubSpans( nSpan, "\n" );
}
// iteratively publish all the rows (active hours) of the tree
void pubTbody( TR_t *hb ) {
for( ; hb; hb = hb->nxt ) {
printf( "\n<tr><td>%02d:00", hb->hr );
pubTrow( hb->TD0->nxt );
}
}
// publish the table headers (only channels that have a new program or a movie today)
void pubThead( void ) {
for( int x = 0; ch[x]; x++ )
if( ch[x][0] == '+' )
printf( "\t<th>%s\n", ch[x] + 1 ); // without "+"
}
// publish today's entire distilled listings
// start a new file
// output my HTML preamble that was loaded from yesterday's file
// output table preamble, today's listings, and exclusion list to use tomorrow
void publish( TR_t *hbs, char *yday, int mmdd, char *fName ) {
int saved = _dup( fileno( stdout ) );
sprintf( fName, "tvg %04d.html", mmdd );
freopen( fName, "wt", stdout );
sprintf( ch[0], "+%04d", mmdd );
printf( "%s\n<body>\n", yday ); // yesterday's "<head>" into today's version
puts( "<table id=tv cellspacing=0>" );
puts( "<thead>" ); pubThead( ); puts( "</thead>" );
puts( "<tbody>" ); pubTbody(hbs); puts( "</tbody>" );
puts( "</table>" );
puts( excl.Bgn ); pubExcl( excl.Tree ); puts( excl.End );
printf( "<p>old:%d exp:%d new:%d out:%d\n", excl.Old, excl.Exp, excl.New, excl.Out );
puts( "</body></html>" );
fclose( stdout );
_dup2( saved, 1 );
}
// measure, allocate heap and load web HTML file
// OR
// measure, allocate chunk of web loaded heap for local use to load MY HTML from yesterday
char *load( char *name, void*(fnc)(size_t) ) {
FILE *fp = fopen( name, "rt" );
if( fp == NULL ) { puts("open bad"); getchar();}
struct stat inf;
fstat( fileno(fp), &inf );
char *p = (char*)fnc( inf.st_size );
fread( p, sizeof *p, inf.st_size, fp );
fclose( fp );
return p;
}
// - access and load web HTML file (300-400Kb)
// - find marker and get the file's listing's date (about 8000 bytes in)
// - calculate and load yesterday's version of my HTML
// - process the exclusion list of program titles
// - digest today's web listings into my tree, then publish those severely reduced listings
// - launch my editor on that 8Kb version (that has a one click "browser" facility)
void main( void ) {
char fName[30] = "./tvg.html";
char *Buffer, *p = allocBuf = Buffer = load( fName, malloc );
if( ( p = strstr( p, "var guideDate = \"" ) ) == NULL ) exit(1);
int mm = twoDgt( p[19], p[20] ), dd = twoDgt( p[17], p[18] );
sprintf( fName, "tvg %04d.html", yesterday( mm, dd ) );
char *yDayBuf = load( fName, alloc );
excl.Tree = procExcl( procChan( yDayBuf ) ); // used channels and excluded titles
publish( digest( Buffer + 89000 ), yDayBuf, mm*100+dd, fName ); // TABLESTART offset
sprintf( Buffer, "start \"C:/Program Files (x86)/EditPlus 2/Editplus.exe\" \"./%s\"", fName );
system( Buffer );
}