The Fidonet Nodelist Database Project ...
[19.08.2013] I have a still open task in my work queue
since a long time that didn't make any progress. |
The structure of current Fidonet Nodelist Database collects the
content of all nodelists ever published. The content has been unified.
In database structures viewpoint, this is a real relational database structure concept.
This means, whenever a new node has been listed, the new nodenumber
becomes a new record in the database. This record never changes
despite the fact, the record is listed every week in the current
nodelist. If the speed setting or another nodelist flag changes,
then a new record will be written into the database.
Another table now records all the nodelists ever imported by
Date and the Julian daynumber in the year.
A third table consists of records, that links the nodelist records
with the nodelists ever published.
Pointlists follows in principle the Nodelist structure, but only in
the principle.
First, there exist at least 5 different Pointlist formats that are documented under
FTS-5002 Pointlist Formats
Despite the fact, there are all pointlists, the structure is considerable different.
But there is one in common, what all pointlist formats shares. This is a
reference to a Nodelist record as all Points listed, are listed as downlinks
of a specific Node. So all pointlists have a reference either way to
a Nodelist Node record - the so called Bossnode.
So here we're right in the middle of a structural definition.
Each Pointlist point record requires to be referenced to an Uplink
Bossnode that is part of a Pointlist format.
The easiest ones are the two formats listed under FTS-5002 named the
Boss Format list and the Poss Format list.
Both lists uses the Boss,[AKA] reference modell.
So there is an easy referencing between a Pointlist record and
the related Nodelist record.
The difference in Boss Format and Poss Format list is, that the
Poss Format uses the Nodelist Status Point for all Points
listings where the Boss Format list allows status flags (Hold, Down, Pvt).
So the Boss Format list is the most precise list in also signal a
points status where the Poss Format only includes redundant informations.
We know, we read a Pointlist, this is signaled by the Boss keyword lines.
So all other lines are point listings. No requirement to signal
and flag them as Point in each pointlist line listing.
For transfering Boss Format or Poss Format pointlists into the
Nodelist Database only the reference changes from a Hub, Host reference
to a Bossnode reference. All other definitions are similar to a
Nodelist record listing.
At this step, we can rethink, if there is a requirement to use the
Point Nodelist status as a required flag, requiering a transfer
into the database or if we can retrieve this info from another info?
To make the story short:
Each AKA is splitted 4 dimensional: [Zone]:[Net]/[Node].[Point]
where a Node listing is still Point # 0.
All other Pointnumbers are considered to be Points.
So every time we want to retrieve the info wether a listing is a Node or a Point listing in the database, we can look into the Pointnumber field and if its Not Equal 0 then it is a Point listing.
So therefor we don't require the Point status defined in the Nodelist status field and we can use the Nodelist status field for the infos we can retrieve out of Boss Format lists - the real status of a Point listing.
This applies to other Pointlist Formats too, that includes a status field NE 'Point' for Point listings.
This format is also known as V7 pointlist format.
The Point Format list still has merged Nodelist entries together
with Points listings. So the Point Format list can be read as a copy
of a Nodelist with integrated Point records. The reference to a Bossnode
is similar to a listing of a Node under a Hub or Net Host filled
with Point records between each Node lines.
To signal Point lines as Point lines, the Point Format uses
the Point flag as a status field parameter similar to the
Poss Format listed under 2.2.1
Likewise the discussion about the Point flag in a Poss Format
Point list, this status flag is required in a Point Format list, to
differentiate a Node record from a Point record.
Incorporate Points from a Point Format list is easy if the focus is
an individual point record. It can be easily transfered like the
Boss Format or Poss Format record can be transfered.
But remember - the Point Format list also includes structural a copy of the
Nodelist(!). So a Bossnode record in the Nodelist is also listed with
a Nodelist line in the Point Format list.
Assuming, that a Nodelist record in a Point Format Pointlist
hasn't been updated properly, we potentialy have 2 different listings
for a Node in 2 lists.
The Nodelist line may differ between the Nodelist and the Pointlist listing.
This isn't a big problem, if the Sysop name in a Nodelist record doesn't
much differ from the Sysop name in a node line of the Point Format list.
But considering, that a nodenumber has been re-assigned to another sysop
in the Nodelist, but the Pointlist nether gets updated, now the
Pointlists Node record clashes with the Nodelists Node record. And this
is a problem.
To solve this problem, a strict analyse in retrieving Pointlists records
is required.
Each Bossnode listing in any Pointlist requires a matching Nodelist record to keep the database clean from orphaned Bossnode and respective Points listings.
This requirement isn't limited to Point Format pointlist records.
There are some known exceptions, where Point Format pointlists
have shortened Region and/or Host lines. This only requires to
have an exception in analysing input, once readin Region and Host lines
from Point/V7 format pointlists, that such lines ends after field 2
and doesn't follow the FTS Nodelist format lines. For the Bossnode
listings, this doesn't change anything, as Bossnodes are still listed
with complete lines, so they can be compared with the corrosponding
Nodelist record. The only exception is to take care while build up
the AKA from lines that have no comma as seperator after the Region
or Host number but followed by a carriage return sequence.
Region,31 <- Carriage Return Zone: ? Host,399 <- Carriage Return Net: 399 ,666,Just_a_Node,Simcity,Joe_Grunt,9-012-345-678,9600,VFC Node: 666 AKA: 399/666 Point,1,Point_1,Simcity,Joe_Point,-Unpublished-,300, Node: 666 AKA: 399/666.1 Point,2,Point_2,Simcity,Joes_Brother,9-345-678-901,9600,V34,XX,U,Tbc Node: 666 AKA: 399/666.2
The Fakenet-Format list is a mixture of all previously listed formats.
Its a full featured Nodelist where Bossnode listings have been shifted
to Host listings and Net Pointlistkeeper listings to Region listings.
So that is the reason, why this format is named a Fakenet Format list.
The Host- and Region- listings are fakes in relation to Nodelist listings
of Bossnodes.
The reference to a Nodelist listing of a Bossnode is made in the systemname
field, field #3 of a nodelist line within a Host line.
The definition is, that the systemname has to represent the 2-dimensional
AKA of a Bossnode.
e.g. Bossnode 2:244/1120 is defined 244/1120 in the systemname field.
So a Fakenet Format Bossnode segment can be read in the following way:
So there is one requirement to identify Fakenet Format pointlists. That is identification by Filename mask.
Pointlist lines can be retrieved from Fakenet Format lists as they've
been retrieved from Boss Format lists, respective the Nodelist status field.
The reference into a real Fidonet Network or Region is given by an addtl. definition
given in the Filemask definition and reference to a Fakenet Format pointlist.
e.g. Net 244, Region 24.
R23 Fakenet Format Exception
In the early days of R23 Fakenet Format pointlists development a 2nd method
to reference Bossnodes to a Nodelist record has been established using
the UBOSS:[2d-AKA] keyword in the Host's line as addtl. Userflag.
Extract the 2d-AKA reference from either Systemname field (default) or from the UBOSS:[2d-AKA] nodelist flags fields (R34 1992-2008).
The Fidouser Format list consists of lines with the following format of a record:
So best practice with some added value is to add the Bossnodes city to a Fidouser Format pointlist line information that maybe exact correct or nearly correct in the majority of all cases. Individual cases can be simply ignored.
This usage adds more valuable information into the transfered pointlist
into the database then all Fidouser Format pointlists get a default
meaningless value:
e.g. City -> Fidouser
The Fidonet Nodelist is the weekly distributed phonebook of all
fidonet members. The compilation consists of the Zone 1 - Zone 6, nowadays
Zone 1 - Zone 4 segments.
Each segment can be distributed seperately, but hasn't in the past, so
the complete Nodelist has been distributed week by week since years.
Pointlist distribution did happen in a different way over years.
The first attempt has been made in the early 2000's to collect regional
pointlist segments together into a combined Zone 2 pointlist segment
that also did include a Zone 6 segment.
From the other Zones until today there is no info available, that ever
pointlist segment distributions did happen.
At least from the end 80's until the early 2000's Pointlist distribution
evaluated several pointlist distribution formats. Each region had
their own specific distribution format. As also different software
used different pointlist formats, it was common, that several regions
distributed more then one different pointlist formats. It also was common,
that the Pointlist distribution format did change over the years.
At the very end the Fido-History-Project archive collected
many different Pointlist formats, multiple formats within a Regions
and a masterlist - the Zone 2 Pointlist.
The Nodelist has a uniqueness, that the Pointlists never had.
Nodelist records are unique in one week. Pointlists records aren't.
The Fido-History-Project collected Pointlists from over 10 years
of pointlist distributions, where the masterlist still did not exist.
Also since Zone 2 Pointlist distribution started, not all Pointlists
have been incorporated in the first few years. To have all pointlist
segments incorporated into the Fido-History-Project Nodelist and
Pointlist database archive, several pointlists have to be imported.
This may result in an overlapping of Pointlist records, e.g. if one
pointlist record has been imported from a regional level Fakenet format
pointlist and the same record in the same week has been imported from
the masterlist.
We know from Nodelists, that sysops may have several different Nodelist
records, e.g. for administrative AKAs and also on a multiple lines
system. But all these records emerges over weeks. Multiple identical
records from different lists aren't forseen, at least not from
the initial database concept.
Therefor, the database construct requires an addtl. marker:
From which list does the listed record comes from?
Host 2:244/0 from Nodelist Hub 2:244/1200 from Nodelist Node 2:244/1120 from Nodelist Point 2:244/1120.1 from Region 24 Bossformat list Point 2:244/1120.1 from Region 24 Fakenet format list Point 2:244/1120.1 from Zone 2 Bossformat list
From the example above we focus on the AKA's:
From the Nodelist we collect AKA's 2:244/0.0, 2:244/1200.0, 2:244/1120.0
and from the Pointlist(s) we collected 2:244/1120.1, 2:244/1120.1 and 2:244/1120.1
so we collected one record and 2 dupe records from 3 different pointlists.
Dupe checking for Pointnumbers is one possible option to merge
several listings from several pointlists into one unique Fido-History-Project
Nodelist and Pointlist archive database record with references from several
sources. All what to do is a dupecheck of an existing Point aka for ones
week Nodelist/Pointlist listing.
In case dupes will be eliminated, how can the references be displayed?
From above example 3 listings of 2:244/1120.1 can be displayed in a table
row by row. But if there is only one row, the question might be, how
these references can be displayed?
Hover info similar to the Nodelist Archive overview, where exceptions
are displayed in red, signals addtl. info if a user hovers the Nodelist
record with the mousepointer.
hovering hovered text display ------------- ---------------------------- Host 2:244/0 from Nodelist Hub 2:244/1200 from Nodelist Node 2:244/1120 from Nodelist Point 2:244/1120.1 from R24PNT, POINTS24, Z2PNT
In the case of unified Point AKA listings, the dupe checking has to
consider, that data integrity requires to be checked, that all dupe records
lists the similar content - Sysopname is identical, City is in one local area.
The biggest problem is, if one of the records doesn't match, the question
will araise, how to proceed? Eliminate the mismatching record? Mark the
mismatching record as mismatched?
Before answering this question, we probably have to ask, how such
mismatches can happen?
Well, a discrepancy between a Nodelist's Bossnode record and a corrosponding
Pointlist Bossnode line in e.g. the Point Format pointlist may happen if
the Nodelist update path is a different update path then the Pointlist
segment update path. But the Bossnode reference into a Nodelist is
still a detected requirement (see section 2.2.2 The Point- Format).
A mismatch of one Point AKA from different pointlist sources can be
assumed, that it will never happen, because support of different
Pointlist format list is handled by the Pointlistkeepers. So if they'll
distribute a specific pointlist format, another pointlist format list
will be distributed by the same Pointlistkeeper to support different
Fidonet software pointlist format styles.
Example: in the early Region 24 pointlist distribution, the Fakenet Format
list distribution was the defined major distribution path. This moved over
to the Bossformat distribution years later. Over the time, both formats
have been distributed in parallel. The content of both files was
identical, because the 2nd format has been converted from #1 distributed
pointlist.
So before we complicating processing, we can assume, that the major
Pointlists AKA listings from several different pointlist format lists
of one distribution week for a specific area are identical. There may
exist cases, where one Point AKA is listed in one pointlist, but not
in another one. But this problem can be solved, by readin all
different pointlist format lists to complete the archive with
missed records.
There exist known cases of Point listings under different Bossnodes.
So one may be listed with different Point AKA's. This doesn't differ
to multiple Nodelist listings of a Bossnode who also has a Hub, a Host and/or
RC listing in the Nodelist.
The only difference will be shown in the Nodelist search, where the
search for a sysopname will probably give more results then a result list
that follows a specific AKA search. But this phenomenon we still have
with the Node's Nodelist search.
In the case of unified Pointlist AKA listings, which lists record will be
give preference?
Example: Point 2:244/1120.1 listing in the Bossformat list shows
,1,AMBROSIA60.1,Frankfurt,Ulrich_Schroeter,+49-69-83837052,300,
and the Fakenet format list shows
,1,244/1120.1,Frankfurt,Ulrich_Schroeter,+49-69-83837052,9600,CM,MO,XA,V32B
Which record we will give the preference?
The Bossformat listing? The Fakenet Format listing, as it was long time
the major distribution version?
Ok, here we have to go deeper into the discovery of current database table
structures.
The table NODELIST lists the different Nodelists by date, year, week
The table NL_NODE lists the different AKAs, the full sysopname and the lastname
The table NL_LINE lists the extracted different Nodelist lines ever listed
with a Nodelist status, systemname, location, phone, baudrate, nodelist flags
The table NL_FRMT still has the different Lists types: Nodelist, Boss, Point, Fakenet
pointlists internal value references
The table NL_ENTRY is an index table that links the Nodelist listed under NODELIST
with the record in the table NL_LINE. Each record in NL_LINE has a reference to
one record in table NL_NODE.
Above example with 2 Point AKA listings will result in two records in table
NL_LINE. Each of these 2 records references to one record in table NL_NODE.
The NL_LINE listing isn't much a problem. But the NL_NODE listing requires
further discovery:
The NL_NODE listing for Nodes consists of detailed reference links. If one
moves from one Hub to another, a new NL_NODE listing will be created
to reference the listing under a different Hub.
At import time, we still have the Bossnodes reference (Net/Node) for a Point,
but what we didn't have is the Bossnodes reference under current Nodelist.
There is still a requirement to check the referenced Bossnode in the
corrosponding Nodelist. So here we have to collect not only, that the
Bossnode listing in current Nodelist is valid, but also what the
reference listing is. This means, under which Region a Bossnode is listed,
under which Hub a Bossnode is listed. If we have collected these infos
from the database, we can also write Point records with these infos
into the database. Problem solved.
If there exist no updates in the Nodelists from one week to the next week,
at least table NL_ENTRY will be filled with the reference links of existing
Nodelist records with the new Nodelist.
Summary:
The accuracy of Pointlist listings depends on the accuracy of Bossnode
references to a corrosponding Nodelist record for the Bossnode.
If this is once checked, the related Pointlist Point AKA records can
be assumed to be correct (minimum of mismatches).
Its more likely, that a complete Bossnode segment is orphaned
then a mismatch of one Pointline will happen.
There may exist different Point sysopnames over the time under one
Bossnode, but they'll evolve over the time. The Point#1 listing in 1999
maybe a different one then in 2009. By search for an AKA this we also see
in Nodelists did happen. By search for a sysopname, the result list
ignores listings with a different sysopname.
So there is a slight preference of unified Pointlist AKA listings. The
only question to answer, how the different references gets displayed.
One of the requirements in importing pointlists is the Nodelist
referencing for accuracy of each Bossnodes segment. If a Bossnode is no
longer listed in the Nodelist, a Pointlist segment of that Bossnode
listed in a later pointlist becomes orphaned.
All Nodelists are currently still imported, so that in an import process
for Pointlists, the referenced Nodelist still exist in the database.
But what happens with new distributions?
Often Pointlists will be published the day before or around the
Nodelist will be created. For a region the region check will be made
against the Nodelist regional segment of the upcoming Nodelist. But there
are still cases known of Nodelist's late distributions, where the Nodelist
arives the Pointlistkeepers after weeks. Importing the Pointlist doesn't
make sense, until an accurate Nodelist has been published. So for the
current Pointlist processing and if the check for a nodelist didn't
find a matching Nodelist, has to be delayed until the corrosponding
Nodelist arrives.
There is still a work queue processing of current Nodelists implemented
in the database update process that have to be used for Pointlists
import processing too. So its easy to trigger a Pointlist processing
where still the corrosponding Nodelist isn't available. The simple
solution is to set the Pointlist import to status delayed until
the corrosponding Nodelist import has been processed. Once the Nodelist
arrives and the Nodelist processing has been triggered, the delayed
Pointlist import can be processed thereafter.
1. | Each Bossnode listing in any Pointlist requires a matching Nodelist record to keep the database clean from orphaned Bossnode and respective Points listings. |
Check each pointlist revision against a corrosponding
Nodelist revision Check each bossnode in a pointlist against a corrosponding Node record in corrosponding Nodelist revision |
2. |
Each AKA is splitted 4 dimensional: [Zone]:[Net]/[Node].[Point] where a Node listing is still Point # 0. All other Pointnumbers are considered to be Points. |
So therefor we don't require the Point status defined in the Nodelist status field and we can use the Nodelist status field for the infos we can retrieve out of Boss Format lists - the real status of a Point listing. |
3. | Records in Pointlists maybe shortened | Pointlist line checks requires check for comma or line termination delimiters |
4. | Fakenet Format pointlists identification follows ... | identification by Filename mask |
5. | Fakenet 2d-AKA references |
from either Systemname field (default) -or- from the UBOSS:[2d-AKA] nodelist flags fields |
6. | Fidouser Format data completion | add the Bossnodes city and probably other infos (Phone, Baud, Nodelistflags) from the Bossnodes Nodelist info to a Fidouser Format pointlist line |
7. | Unified Point AKA listings per Nodelist week | Check if Point AKA exist otherwise add new record |
8. | Pointlists processing requires queuing | If a corrosponding Nodelist has been processed the Pointlist can be imported, otherwise processing has to be delayed |
9. | Referencing data source | Add a data source type reference to each Nodelist and Pointlist line (Nodelist, Points24, Z2PNT) |