Hi Dan,
Thank for your interest in our jtv format.
I am the developer who created the jtv format. We provide a DirectShow filter for this format. However, I understand you do not use DirectShow. So here it goes:
JTV file
JTV is a loose container. We use audio and video elementary streams from DirectShow filters (basically Microsoft MPEG-2 Demultiplexer). The jtv file itself is just a stub file that has some basic info, such as the file names of other supporting files that can be found in the same folder. The file paths saved in the jtv file are all relative to current folder. JTV also contains info about the beginning and end of recording, in the form of REFERENCE_TIME.
The JTV file consists of a header, as defined below,
class JRTVRecordingStubHeader
{
public:
char m_cID[4]; // file ID. must be "JTVS"
long m_lFileVersion;
REFERENCE_TIME m_aryrtStartTime[NumOfStreams];
REFERENCE_TIME m_aryrtStopTime[NumOfStreams];
};
and four relative file paths for the format file, the index file, the audio data file, and the video data file.
The recording usually contains one audio stream and one video stream. It may be audio only or video only (i.e. one of the streams maybe missing).
The actual audio and video streams are saved in two logical files - audio in .jta file, and video in .jts file. The "ts" in .jts stand for time-shifting, that is what this whole thing was intended for in the beginning (time-shifting live TV). The last letter is changed to 'a' for the audio data, 'f' for format data, 'i' for index. So .jtf file contains some formatting info, and .jti file contain an index table to quickly map a time-stamp to file location of of audio or video data file.
Audio and video data files
The actual audio and video data are split into multiple chunks. That is why I call jta and jts "logical" files. The .jts and .jta files should only contain the following:
char m_cID[4]; // Header ID. must be "JRSR"
long m_nVersion;
int64 m_FileSize; // the size of audio or video data (logical file size)
int64 m_ChunkSize; // the size of chunks
So to find data at a particular file location, you need to use the chunk size to figure out which chunk the data is actually in, and the offset into that chunk.
The audio and video elementary stream data contained in these files are logically divided into "samples". Actually we do not divide them, we just take the samples of a DirectShow filter's output and save them in the files, without interpretation. When we play the recording, we load the data sample-by-sample, and pass each sample to down-stream DirectShow filters. It is therefore up to you to interpret what each sample contains.
Audio and video sample headers
Each sample comes with a header. The following is the specification of the latest header:
struct MediaSampleHeader2
{
char cID[4]; // must be "MJTS"
LONG lPrevDataLen; // the size of previous sample, not including the header size
LONG lThisDataLen; // the size of the current sample, not including the header size
DWORD dwSampleIndex; // the serial number of the sample
REFERENCE_TIME rtStart; // the time-stamp (start time) of the current sample
REFERENCE_TIME rtEnd; // the time-stamp (end time) of the current sample
unsigned char bIsSyncPoint : 1; // the sample is a sync point
unsigned char bIsPreRoll : 1; // the sample is pre-roll
unsigned char bIsDiscontinuity : 1; // the sample represents discontinuity
unsigned char nChannelChange : 5; // channel change (the value changes by one, from 0 to 31 and back to 0, each time channel changes)
unsigned char nStreamType : 6; // MPEG1, MPEG2, AC3, AAC, etc. for audio, MPEG2, MPEG1, H264/AVC etc. for video
};
Older version (a few years ago) of jtv files use a different header. For now, I will just tell you the latest version.
A little more explanation of the last two items in the above structure:
nChannelChange - the value increments by one on channel change.
nStreamType - the stream type of audio or video.
When user switches channel, there is a possibility that the stream format is changed because of that. The nChannelChange flag allows you to pay attention to such fact, and examine the sample data to determine whether there is a format change or not. If we actually detected a format change during recording, then the nStreamType flag will contain the new format. The formats are defined as follows:
// MediaSampleHeader2 allocates a 6-bit field for StreamType
// so the max allowed value is 63
// We reserve the value 15 for TVAT_PinConnectionType/TVVT_PinConnectionType type which is used for special purposes
// the other 62 values (0 - 14 and 16 - 63) are for actual stream types, currently only 3 or 4 are used
enum TVAudioTypes
{
TVAT_MPEG1,
TVAT_AC3,
TVAT_MPEG2,
TVAT_AAC, // AAC_LATM used in European (British) HD TV channels
TVAT_RAW_AAC1, // used in Hauppauge HD PVR / Colossus
TVAT_EAC3,
TVAT_PinConnectionType = 15,
};
enum TVVideoTypes
{
TVVT_MPEG2,
TVVT_MPEG1,
TVVT_AVCH264,
TVVT_MPEG4,
TVVT_PinConnectionType = 15,
};
Note that in earlier versions only 5 bits were allocated for these media type values. That is why TVAT_PinConnectionType and TVVT_PinConnectionType have the value of 15 (the largest possible value at the time). Now the value 15 is stuck, even though the largest possible value is now 63.
The Format file (.jtf)
The format file contains some basic format info. Again, the actual structure has evolved over several versions. The beginning portion of the format file contains the CTSFormatFile structure defined below (the latest version).
enum StreamIndex
{
Audio = 0,
Video = 1,
NumOfStreams // 2
};
class AudioVideoTimeStamps
{
public:
REFERENCE_TIME rtAudioMaxTime; // max audio time stamp
REFERENCE_TIME rtVideoMaxTime; // max video time stamp
DWORD dwAudioSampleCount; // total number of audio samples
DWORD dwVideoSampleCount; // total number of video samples
INT64 llAudioLatestPosition;
INT64 llVideoLatestPosition;
};
#define TSFORMATFILE_VERSION 8
class CTSFormatFileV1
{
public:
int m_nVersion; // current version is 8
DWORD m_dwMaxTimeshift; // max time-shifting, in seconds
DWORD m_dwInsurance; // extra space allocated beyond the m_dwMaxTimeshift (it should be 2 - meaning 2% of m_dwMaxTimeshift)
int m_SplitterReaderMode; // should have a value of 2
BOOL m_bFilesStoppedGrowing; // should be true for TV recordings that have finished recording
AudioVideoTimeStamps m_TimeStamps; // defined above
};
class CTSFormatFile : public CTSFormatFileV1
{
public:
REFERENCE_TIME m_aryMinTimestamps[NumOfStreams]; // array of 2 elements, the first for audio, the second for video.
};
The next block of data in jtf file is CMediaTypeHeader defined below.
#define CUR_JRTV_FILE_VERSION 2
struct JRTVFileVersion
{
char cID[4]; // file ID. must be "JRTV"
long lFileVersion; // currently at 2, not changed since it was created
};
The JRTVFileVersion header is a bit redundant and not really significant. The significant version number is the version number in CTSFormatFileV1, currently at version 8.
class CMediaTypeHeader : public JRTVFileVersion
{
public:
long m_lAudioBufferSize; // buffer size of audio data
long m_lVideoBufferSize; // buffer size of video data
};
The two numbers represent the best estimate of how large the audio or video data buffer should be, for DirectShow filters. These numbers should be larger or equal to the sample (see "audio and video data files" above) data size.
Next, for format file version 8 (current version) and above, two 32-bit integers are in the file:
int nVideoStandard; // video standard, following DirectShow definition. NTSC, PAL etc. To get frame rate correct.
int nAvailableStreams; // how many streams are available (audio, and/or video)
They are defined below:
Video standard is as defined in DirectShow -
enum tagAnalogVideoStandard
{
AnalogVideo_None = 0,
AnalogVideo_NTSC_M = 0x1,
AnalogVideo_NTSC_M_J = 0x2,
AnalogVideo_NTSC_433 = 0x4,
AnalogVideo_PAL_B = 0x10,
AnalogVideo_PAL_D = 0x20,
AnalogVideo_PAL_G = 0x40,
AnalogVideo_PAL_H = 0x80,
AnalogVideo_PAL_I = 0x100,
AnalogVideo_PAL_M = 0x200,
AnalogVideo_PAL_N = 0x400,
AnalogVideo_PAL_60 = 0x800,
AnalogVideo_SECAM_B = 0x1000,
AnalogVideo_SECAM_D = 0x2000,
AnalogVideo_SECAM_G = 0x4000,
AnalogVideo_SECAM_H = 0x8000,
AnalogVideo_SECAM_K = 0x10000,
AnalogVideo_SECAM_K1 = 0x20000,
AnalogVideo_SECAM_L = 0x40000,
AnalogVideo_SECAM_L1 = 0x80000,
AnalogVideo_PAL_N_COMBO = 0x100000,
AnalogVideoMask_MCE_NTSC = ( ( ( ( ( ( AnalogVideo_NTSC_M | AnalogVideo_NTSC_M_J ) | AnalogVideo_NTSC_433 ) | AnalogVideo_PAL_M ) | AnalogVideo_PAL_N ) | AnalogVideo_PAL_60 ) | AnalogVideo_PAL_N_COMBO ) ,
AnalogVideoMask_MCE_PAL = ( ( ( ( AnalogVideo_PAL_B | AnalogVideo_PAL_D ) | AnalogVideo_PAL_G ) | AnalogVideo_PAL_H ) | AnalogVideo_PAL_I ) ,
AnalogVideoMask_MCE_SECAM = ( ( ( ( ( ( ( AnalogVideo_SECAM_B | AnalogVideo_SECAM_D ) | AnalogVideo_SECAM_G ) | AnalogVideo_SECAM_H ) | AnalogVideo_SECAM_K ) | AnalogVideo_SECAM_K1 ) | AnalogVideo_SECAM_L ) | AnalogVideo_SECAM_L1 )
} AnalogVideoStandard;
// when both audio and video streams are available in the recording, the nAvailableStreams value should be 3
enum AvailableStream
{
AudioAvailable = 1,
VideoAvailable = 2,
};
Next are audio and video media type data. This data is useful for setting up initial DirectShow filter connection. The audio and video media types may or may not stay the same during variable parts of the recording (see description of "Audio and video sample headers" above).
The data structure for media types follow DirectShow's definition of AM_MEDIA_TYPE. The data are written to file as follows:
AM_MEDIA_TYPE structure, excluding the last member of the structure (BYTE *pbFormat) - therefore the size written is sizeof (AM_MEDIA_TYPE) - sizeof(DWORD *).
The binary format data (cbFormat bytes).