melbourne house

by philip mitchell

THE HOBBIT DATA FORMAT
this document contains the data format for the hobbit, it is not complete but gives a good start and understanding. for further information download HOBBIT source this may help fill in some of the blanks. should you find any errors with this document or indeed have any information to add, please send me an email.


WORK IN PROGRESS

The first thing to point out in this file; is that there are more than one version of The Hobbit in circulation. The became apparent when both myself and Sean Irvine started to collate our information regarding the format. For simplicity sake we have stuck with the version I had, so if you want to take this any further download it from here.

another thing that struck me when looking at the data for this game; I expected to find much more of a system. however I was disappointed to find that the game is very hard coded.

address 0x6000 Alphabetical Word Index
this table is a quick look up table into the tokens/word list. The table as 26 entries covering A-Z. Basically when the user types a phrase in each word is looked in the word list. By taking the first letter of the word an using this table, the search is speeded up.
type offset description

WORD

00

Offset to list of words beginning with same letter, add 0x6000 to the value.

 

address 0x6040 Word list
this is a big list of words used by the game.
when words are used in the game a value which has 0x6000 added points straight to the word. each byte is a 5 bit value between 1 and 26. Adding 64 gives the the required letter. However bits 7,6 and 5 are used for something else. I can't quite get my head around the format, but the following example code will show how it works. One of the bits means that the following two bytes are an offset to a word which is a synonym. eg. NE points to NORTHEAST.

#include <stdio.h>
#include <stdlib.h>

/*
* This is pratically how it is coded in ASM
* in the actual game, so it is much easier to
* look at it like this - SORRY for the messy gotos!
*/

typedef unsigned char u8 ;
typedef unsigned short u16 ;

void DecodeWord (u16 address)
{
u8* pSrc = &memorymap[address+0x6000];
u8* pDst = (u8*)wordbuffer;
u8  byte;
int count = 0;

L6f88:
    byte = pSrc[0] & 0x1f ;
    if ( byte == 0 ) goto L6f92 ;
    *pDst++ = 64 + byte ;

L6f92:
    pSrc++;
    count++;
    byte = pSrc[-1];
    if (!(byte&0x80)) goto L6f88;
    if ( count == 2 ) goto L6f88;
    if ( count != 3 ) goto L6faa;
    byte = pSrc[-2];
    if (byte&0x80) goto L6f88;

L6faa:
    byte = pSrc[-1];
    if ((byte&0x40)) {
        // skip synonym
        // DecodeWord ( (pSrc[1]<<8)|pSrc[0] );
        pSrc+=2;
    }
    pDst='\0';
    return ;
}

address 0x7295 PrintMsg action routine
a jump table for the actions used in the PrintMsg routine. there are only 23 entries
type offset description

WORD

00

function address

0x00 [ it ] 0x10 "You are"
0x01 0x11 [ the NPC is ]
0x02 0x12 -- unused --
0x03 [ weapon ] ?? 0x13 only used in message [ad9d]
0x04 [ item / room ] ?? 0x14 end of msg
0x05 only used in message [af6e] 0x15 end of msg
0x06 "You" 0x16 end of msg
0x07 [ describe NPC or OBJECT ]
0x08 only used in message [af26]
0x09 [ describe an object]
0x0A -- unused --
0x0B
0x0C [ the ]
0x0D new line
0x0E "his"
0x0F -- unused --

 

address 0x818E Location help message table
a table that has help messages for certain locations.
type offset description

BYTE

00

room id

WORD 01 offset to message

 

address 0xA471 Goblin table?
type offset description

BYTE

00

object number

WORD 01 address of a variable
BYTE 03 current location
WORD 04 offset to token describing the goblin

 

address 0xAAF7 Action table
a table that has a four word list. the high byte of each word has bits 7 and 6 used to mean something about the next token... the entry number is used as an action in later stages of the code
type offset description

WORD

00

first word, always used

WORD 02 possible second word
WORD 04 possible third word
WORD 06 possible fourth word

 

address 0xACD1 Determiner Table 1
a four entry table that has the tokens to describe the quantity of an object - THE, A, AN, SOME
type offset description

WORD

00

token

 

address 0xACD9 Determiner Table 2
same as Table 1 but holds - THE, THE, THE, SOME

 

address 0xACE1 Common Words table
a list of common words used in the PrintMsg routine to reduce a word to one byte. the words are A, AND, ARE, AT, BE, BLOW, BUT, CANNOT, CARRYING, DO, DOOR, DRAGON, FALL, FROM, HERE, I, IN, IS, IT, NOT, OF, ON, SEE, SOME, THE, THERE, THIS, TO, TOO, WHAT, WITH, YOU
type offset description

WORD

00

token

 

address 0xAD21 Start of messages
the base address of the messages. the messages are tokenised. stepping through byte at a time.
if bit 7 of the byte is set then
f mode word
7 6 5 4 3 2 1 0

word is the high byte and the following byte are an offset to a word.
wordoffset = 0x6000 + (byte&0x0F)<<8) + nextbyte

where mode is

0 nothing
1 word ends with 's'
2 end of msg
3 full stop, and end of msg
4 word ends with; 's', 'd', 'ing' or 'es'
5 unknown
6 unknown
7 First character of word uppercase
if the byte is < 0x20 then it is a special routine that will do something; usually print something else. see PrintMsg action routine table
if the byte is between 0x20 and 0x60 then it is a literal ascii char
if the byte is 0x60 or above then minus 0x60 off it and use the result as a common word from the common word table

void PrintMsg ( u16 address )
{
u8*  pSrc = &memorymap[address];
u8   token;
u16  word;
bool eol=0;
bool dot;
int  ending;

    for (;;) {

        if ( eol ) break;

        eol = 0;
        dot = 0;
        ending=0;

        token = *pSrc++;
        address++;

        if ( token&0x80 ) {
           
            word = ((token&0x0f)<<8) | *pSrc++ ;
            mode = (token&0x7f) >> 4 ;
            DecodeWord ( word );

            switch ( mode ) {
                case 0:
                    break;
                case 1:
                    ending = 1;
                    break;
                case 2:
                    eol = 1;
                    dot = 1;
                    break;
                case 3:
                    eol = 1 ;
                    break;
                case 4:
                    ending=2;
                    break;
                case 5:
                    printf( "{5}");
                    break;
                case 6:
                    printf( "{6}");
                    break;
                case 7: // uppercase char
                    wordbuffer[0] = toupper(wordbuffer[0]);
                    break;
            }

            printf( " %s", wordbuffer );

            if ( dot ) printf( ".");
            if ( eol ) printf( "\n");
            if ( ending==1 ) printf( "(s)");
            if ( ending==2 ) printf( "(s|d|ing|es)");

        } else if ( token < 0x20 ) {
          // special routine
          //printf( "%s", specials[token] );
          if ( token==0x14 || token==0x15 || token==0x16 )
                eol=1;
        } else if ( token < 0x60 ) {
            // print char
            printf( "%c", tolower(token) );
        } else {
            // print common word
            DecodeWord ( CommonWords[token-0x60] );
            printf( " %s", wordbuffer );
        }
    }   


}

address 0xB984 Location Table
table holding the memory address of a location - 80 entries
type offset description

BYTE

00

location id

WORD 01 offset to location information

 

address 0xBA24 Preposition Table
a five entry table that has the tokens to describe the where something is in relation to something else  - OUTSIDE, IN, IN, ON, AT
type offset description

WORD

00

token

 

address 0xBA2E Start of locations
the base address of the locations
type offset description
BYTE 00  
        prep  
7 6 5 4 3 2 1 0

prep - offset into preposition table

 

BYTE 01 if not 0xff
WORD 02 description word 3
WORD 04 description word 1
WORD 06 description word 2

WORD

08

offset to message text

more follows...

 

address 0xC007 Object Table
table holding the memory address of an object - 60 entries
type offset description

BYTE

00

object id

WORD 01 offset to object  information

 

address 0xC0BF Start of objects
the base address of the objects
type offset description
BYTE 0
BYTE 1 if not 0xff then it's a id to another object that this one is inside
BYTE 2
BYTE 3
BYTE 4
BYTE 5
BYTE 6
BYTE 7
various flags
7 6 5 4 3 2 1 0
WORD 8 description word 1
WORD 10 description word 2
WORD 12 description word 3
WORD 14 message
BYTE 16 current location
BYTE 17 if 0xff then object finsihed
more follows...

 

address 0xC6D3 Action code table
table holding the memory address of routines for various actions
type offset description

BYTE

00

id

WORD 01 offset to action routine

 

address 0xC731 ????? code Table
table holding the memory address of
type offset description

BYTE

00

id

WORD 01 offset to id routine

 

address 0xCC00 Location Graphics Table
table holding the memory address of the drawing information for the locations graphics
type offset description

BYTE

00

Location Id

WORD 01 offset to location graphic

 

address 0xCC43 Start of Location Graphics
the base address of the location graphics

 

  Notable memory addresses
carious address that are important to something
type address description

BYTE

B68C Current Object
BYTE COCF Current Location
BYTE B68B Current Action
WORD b6b0 Offset to current object
BYTE 6dcc Squiggle Graphics
BYTE 6ff2 Input Buffer
BYTE 74a6 Output Buffer
BYTE 781e Main Font
BYTE 7bf7 KeyboardMap1
BYTE 7c1f KeyboardMap2

NOTE: A list of objects, messages and locations can be found with the source code.