The Hobbit Data Format

melbourne house

by philip mitchell

THE HOBBIT DATA FORMAT
this document contains the data format for the hobbit, it is not complete but gives a good start and understanding. for further information download HOBBIT source this may help fill in some of the blanks. should you find any errors with this document or indeed have any information to add, please send me an email.

WORK IN PROGRESS

The first thing to point out in this file; is that there are more than one version of The Hobbit in circulation. The became apparent when both myself and Sean Irvine started to collate our information regarding the format. For simplicity sake we have stuck with the version I had, so if you want to take this any further download it from here.

another thing that struck me when looking at the data for this game; I expected to find much more of a system. however I was disappointed to find that the game is very hard coded.

address	0x6000	Alphabetical Word Index
	this table is a quick look up table into the tokens/word list. The table as 26 entries covering A-Z. Basically when the user types a phrase in each word is looked in the word list. By taking the first letter of the word an using this table, the search is speeded up.

type	offset	description
WORD	00	Offset to list of words beginning with same letter, add 0x6000 to the value.

address	0x6040	Word list
	this is a big list of words used by the game. when words are used in the game a value which has 0x6000 added points straight to the word. each byte is a 5 bit value between 1 and 26. Adding 64 gives the the required letter. However bits 7,6 and 5 are used for something else. I can't quite get my head around the format, but the following example code will show how it works. One of the bits means that the following two bytes are an offset to a word which is a synonym. eg. NE points to NORTHEAST.

#include <stdio.h>
#include <stdlib.h>

/*
* This is pratically how it is coded in ASM
* in the actual game, so it is much easier to
* look at it like this - SORRY for the messy gotos!
*/

typedef unsigned char u8 ;
typedef unsigned short u16 ;

void DecodeWord (u16 address)
{
u8* pSrc = &memorymap[address+0x6000];
u8* pDst = (u8*)wordbuffer;
u8 byte;
int count = 0;

L6f88:
    byte = pSrc[0] & 0x1f ;
    if ( byte == 0 ) goto L6f92 ;
    *pDst++ = 64 + byte ;

L6f92:
    pSrc++;
    count++;
    byte = pSrc[-1];
    if (!(byte&0x80)) goto L6f88;
    if ( count == 2 ) goto L6f88;
    if ( count != 3 ) goto L6faa;
    byte = pSrc[-2];
    if (byte&0x80) goto L6f88;

L6faa:
    byte = pSrc[-1];
    if ((byte&0x40)) {
        // skip synonym
        // DecodeWord ( (pSrc[1]<<8)|pSrc[0] );
        pSrc+=2;
    }
    pDst='\0';
    return ;
}

address	0x7295	PrintMsg action routine
	a jump table for the actions used in the PrintMsg routine. there are only 23 entries

type

offset

description

WORD

function address

0x00	[ it ]	0x10	"You are"
0x01		0x11	[ the NPC is ]
0x02		0x12	-- unused --
0x03	[ weapon ] ??	0x13	only used in message [ad9d]
0x04	[ item / room ] ??	0x14	end of msg
0x05	only used in message [af6e]	0x15	end of msg
0x06	"You"	0x16	end of msg
0x07	[ describe NPC or OBJECT ]
0x08	only used in message [af26]
0x09	[ describe an object]
0x0A	-- unused --
0x0B
0x0C	[ the ]
0x0D	new line
0x0E	"his"
0x0F	-- unused --

address	0x818E	Location help message table
	a table that has help messages for certain locations.

type	offset	description
BYTE	00	room id
WORD	01	offset to message

address	0xA471	Goblin table?

type	offset	description
BYTE	00	object number
WORD	01	address of a variable
BYTE	03	current location
WORD	04	offset to token describing the goblin

address	0xAAF7	Action table
	a table that has a four word list. the high byte of each word has bits 7 and 6 used to mean something about the next token... the entry number is used as an action in later stages of the code

type	offset	description
WORD	00	first word, always used
WORD	02	possible second word
WORD	04	possible third word
WORD	06	possible fourth word

address	0xACD1	Determiner Table 1
	a four entry table that has the tokens to describe the quantity of an object - THE, A, AN, SOME

	type	offset	description
	WORD	00	token

address	0xACD9	Determiner Table 2
	same as Table 1 but holds - THE, THE, THE, SOME

address	0xACE1	Common Words table
	a list of common words used in the PrintMsg routine to reduce a word to one byte. the words are A, AND, ARE, AT, BE, BLOW, BUT, CANNOT, CARRYING, DO, DOOR, DRAGON, FALL, FROM, HERE, I, IN, IS, IT, NOT, OF, ON, SEE, SOME, THE, THERE, THIS, TO, TOO, WHAT, WITH, YOU

	type	offset	description
	WORD	00	token

address

0xAD21

Start of messages

the base address of the messages. the messages are tokenised. stepping through byte at a time.

if bit 7 of the byte is set then

f	mode			word
7	6	5	4	3	2	1	0

word is the high byte and the following byte are an offset to a word.
wordoffset = 0x6000 + (byte&0x0F)<<8) + nextbyte

where mode is

0	nothing
1	word ends with 's'
2	end of msg
3	full stop, and end of msg
4	word ends with; 's', 'd', 'ing' or 'es'
5	unknown
6	unknown
7	First character of word uppercase

if the byte is < 0x20 then it is a special routine that will do something; usually print something else. see PrintMsg action routine table

if the byte is between 0x20 and 0x60 then it is a literal ascii char

if the byte is 0x60 or above then minus 0x60 off it and use the result as a common word from the common word table

void PrintMsg ( u16 address )
{
u8* pSrc = &memorymap[address];
u8   token;
u16 word;
bool eol=0;
bool dot;
int ending;

    for (;;) {

        if ( eol ) break;

        eol = 0;
        dot = 0;
        ending=0;

        token = *pSrc++;
        address++;

        if ( token&0x80 ) {

            word = ((token&0x0f)<<8) | *pSrc++ ;
            mode = (token&0x7f) >> 4 ;
            DecodeWord ( word );

            switch ( mode ) {
                case 0:
                    break;
                case 1:
                    ending = 1;
                    break;
                case 2:
                    eol = 1;
                    dot = 1;
                    break;
                case 3:
                    eol = 1 ;
                    break;
                case 4:
                    ending=2;
                    break;
                case 5:
                    printf( "{5}");
                    break;
                case 6:
                    printf( "{6}");
                    break;
                case 7: // uppercase char
                    wordbuffer[0] = toupper(wordbuffer[0]);
                    break;
            }

            printf( " %s", wordbuffer );

            if ( dot ) printf( ".");
            if ( eol ) printf( "\n");
            if ( ending==1 ) printf( "(s)");
            if ( ending==2 ) printf( "(s|d|ing|es)");

        } else if ( token < 0x20 ) {
          // special routine
          //printf( "%s", specials[token] );
          if ( token==0x14 || token==0x15 || token==0x16 )
                eol=1;
        } else if ( token < 0x60 ) {
            // print char
            printf( "%c", tolower(token) );
        } else {
            // print common word
            DecodeWord ( CommonWords[token-0x60] );
            printf( " %s", wordbuffer );
        }
    }

}

address	0xB984	Location Table
	table holding the memory address of a location - 80 entries

type	offset	description
BYTE	00	location id
WORD	01	offset to location information

address	0xBA24	Preposition Table
	a five entry table that has the tokens to describe the where something is in relation to something else - OUTSIDE, IN, IN, ON, AT

	type	offset	description
	WORD	00	token

address	0xBA2E	Start of locations
	the base address of the locations

type

offset

description

BYTE

				prep
7	6	5	4	3	2	1	0

prep - offset into preposition table

BYTE

if not 0xff

WORD

description word 3

WORD

description word 1

WORD

description word 2

WORD

offset to message text

more follows...

address	0xC007	Object Table
	table holding the memory address of an object - 60 entries

type	offset	description
BYTE	00	object id
WORD	01	offset to object information

address	0xC0BF	Start of objects
	the base address of the objects

type

offset

description

BYTE

if not 0xff then it's a id to another object that this one is inside

BYTE

various flags
7	6	5	4	3	2	1	0

WORD

description word 1

WORD

description word 2

WORD

description word 3

WORD

message

BYTE

current location

BYTE

if 0xff then object finsihed

more follows...

address	0xC6D3	Action code table
	table holding the memory address of routines for various actions

type	offset	description
BYTE	00	id
WORD	01	offset to action routine

address	0xC731	????? code Table
	table holding the memory address of

type	offset	description
BYTE	00	id
WORD	01	offset to id routine

address	0xCC00	Location Graphics Table
	table holding the memory address of the drawing information for the locations graphics

type	offset	description
BYTE	00	Location Id
WORD	01	offset to location graphic

address	0xCC43	Start of Location Graphics
	the base address of the location graphics

		Notable memory addresses
	carious address that are important to something

type	address	description
BYTE	B68C	Current Object
BYTE	COCF	Current Location
BYTE	B68B	Current Action
WORD	b6b0	Offset to current object
BYTE	6dcc	Squiggle Graphics
BYTE	6ff2	Input Buffer
BYTE	74a6	Output Buffer
BYTE	781e	Main Font
BYTE	7bf7	KeyboardMap1
BYTE	7c1f	KeyboardMap2

NOTE: A list of objects, messages and locations can be found with the source code.