Other issues in this category (25)
The file under the microscope
We are so accustomed to so many of the things around us that we don't even think about them. What could be simpler than a file? We open, close, and edit them… But dig a little deeper, and you'll realise that every file has a certain structure that enables applications to parse it.
Let's try to find the definition of a file.
File — a data sequence stored on any physical media and having its own name and extension. The extension is used to identify the file type. Written to the right of the file name, the extension is separated from it by a full stop.
That’s the way it was in the MS-DOS era—back then a file was merely a sequence of sectors on a floppy disk or a hard drive.
Some things have changed since then, but old definitions don't go away.
In the most general terms, a file can be defined (although it is only my humble opinion) as a sequence of bytes that can be identified by specific criteria.
This definition applies to any operating system.
Well-well. Let's use the modern office document format DOCX as an example. You probably deal with it every day.
Try a simple experiment. Create a document, say, test.docx; save it, and close the editor window. Now rename it test.zip (this is not strictly necessary, but let's do it anyway just to make sure that it will work on any computer), and then try to open it with a file archiver.
An unexpected result, right? The office document is nothing more than an archive. And you can, for example, replace all the images in the file with the pictures you want. Anyone can do it.
So, determining a file's structure and its individual components is the first tier in file analysis. There are also components within components and so on (you may also like to read our issue “Bombs and evolution”).
But there’s more! In Explorer you only see the tip of the iceberg. Depending on the file system, there may exist information within it that no one will see in Explorer. If you want to find out more, read the issue "Tricks with files".
In fairy tales, heroes often carry around bags in which they put all sorts of things and yet the size and weight of the bags don't change. Files and directories in the NTFS file system, which is used in modern versions of Windows, are in fact such magic bags.
If you think it’s too much of a stretch to say that, you’re wrong.
Here, for example, is what Microsoft wanted to incorporate into Windows Server 2003 and Vista:
WinFS features a future SQL-server (Yukon) file system based on a relational database.
The new system factors in content-based file attributes including their author, content type, title, source and the person who was the last to access it. The folder structure in Windows Explorer is essentially a virtual map. It facilitates navigation between files but doesn't show how files are actually stored.
From a user’s standpoint, there is no need to know where files are actually located. Instead, Windows organises data in virtual folders according to its type. A user wanting to search for a file can use such parameters as "All holiday photos in last two years" (file type, origin, and specific time period) instead of information about the file format, author and location.
A set of available file attributes in WinFS can be changed by developers using XML metadata. They can also establish relationships between the attributes. For example, all the documents created by a specific author can be displayed, along with information about the person, their address, and photos associated with them.
A set of viewing options in Explorer, as well as commands for specific file types, can also be changed. Developers can specify, for example, what options will appear in the drop-down menu for specific attributes and which will be displayed as icons. That's why the file manager in Seven can perform a completely new range of tasks. For example, a developer can indicate which commands are to be executed if a user searches for data having certain attributes. If a user searches through an email archive, Windows Explorer can launch Outlook to compose a reply message that will be sent if the user clicks on a corresponding button.
Relationships are also likely to be established with security permissions using the Next Generation Secure Computer Base (NGSCB) whose elements are already available in Seven's alpha-version. It is possible that at some point Windows will be able to organise files in accordance with security criteria.
A file system as a database. All the data that appears in the system is automatically organised and provided to users whenever needed. So separate files do not exist—there is only automatically organised information. This was an ambitious idea, but it didn't work out. And anti-virus developers gave a sigh of relief.
And how about a file system with back-up features?
ReFS employs the Copy-on-write update strategy for metadata, a strategy that allocates new chunks for every update transaction and uses large batches.
BeOS/Haiku boasts even more sophisticated features:
In addition to file name, size and location, the system also knows a lot of other information.
For example, all pre-installed applications are stored in /system/apps. But the Tracker file manager interface doesn't regard those as mere files. It distinguishes these applications as programs for a specific OS version, factors in the program version and the name of the respective author. All this information is written in the file system and is associated with the respective files. So the system regards the list of applications in the Deskbar and the Tracker's file list as the same piece of data.
The Tracker doesn't offer the option to copy and paste a file. Instead, the drop-down menu contains the “Copy to” item. Selecting it brings up a submenu containing a list of destination directories. This turned out to be a very convenient way to quickly copy or move files.
Now let's consider a seemingly simple task: copy an audio CD onto a hard drive. Windows users have often asked me why the files they copied wouldn’t play. Of course only the names of the tracks (.cda files) were copied. So I had to go into a lengthy explanation and tell them about audio CD grabbers. And how did it work in BeOS? Open an audio CD, see files with the .wav extension, and copy them to the hard drive. That's it! No other software was required.
This appeared as a revolution!
And under Linux/Unix any piece of data is either a process or a file. Devices that are available in the system, as well as RAM and much more, are presented as files.
Here a file is not necessarily a sequence. In most operating systems, a file is indeed data written in a sequence of sectors on a disk or other media. And links to those sectors are stored in a designated area on the media. But in some file systems, certain files may be stored right where data related to the location of other files is usually recorded. This increases access speed.#terminilogy
Those are only a few tricks that can be used to hide organised data, i.e., files placed in secret locations. There exist symbolic and hard links and bodiless viruses that store their code in the Windows Registry. There exist all sorts of things. And an anti-virus patiently parses any piece of data in a system while you remain completely unaware of what it’s doing.