Sunday 27 November 2011

Clipboard

After a few people have asked for it, you can now set the default clipboard format to optionally be "hex text".  Actually there are 4 options now as explained below.  There are some advantages to the way HexEdit has always worked with the clipboard but the new options allow users to configure the program to work the way they expect or want it to work.

Moreover, copying a selection of any size to the clipboard is now supported (previously there was a limit of 16 MBytes).

History

When I first started work on HexEdit in 1997 I tried another Windows hex editor (I think it was called Hex Workshop).  When I used it to copy and paste data it had the very poor behaviour that all bytes with the value zero (ie nul bytes) were lost.  On investigation I discovered that it was copying data to the clipboard simply as text, but the Windows clipboard text format is not designed to handle binary data, which meant that some byte values were lost.

Then I discovered that you could copy and paste binary data using Visual Studio (actually it was called Developer Studio in those days).  The Visual Studio hex editor uses a special binary format that allows all byte values to be copied and pasted.  This was the obvious clipboard format to use with a binary file editor.  However, the disadvantage with the way Visual Studio worked is that you could not copy text data onto the clipboard to paste into another program such as a text editor.

HexEdit 1.0 Clipboard Formats

If you didn't know, the Windows clipboard allows you to copy the same "thing" onto the clipboard in several different formats at once.  For example, you could copy a picture as separate bitmap and vector graphics formats and even perhaps as some sort of text that describes the picture.  This allows great flexibility for programs to interact.

The obvious solution for HexEdit was that when copying data to the clipboard to copy it in two separate formats:
1. as binary data (in the format used by Visual Studio)
2. in standard text format to allow pasting into other programs

When pasting, HexEdit would use the binary data format if present, so there was no chance of any loss of data when copying and pasting within HexEdit itself.  This also meant that sharing data with the Visual Studio hex editor was simple.

If the binary data format was not present, and the text format was present, it would just paste the text data.

Text Formats

However, the above system only provided simple text support (but still much better than Visual Studio's support, which was none).

The HexEdit character area has always supported display in different character sets, so when copying to the clipboard as text it was obvious that the text should be converted from the current character set into the Windows clipboard text format (which was ASCII for Windows 95 and descendants, or Unicode for Windows NT/2K/XP/7).  For example, this allowed you to easily extract data from an EBCDIC text file and paste it into a Windows text editor or word processor which worked with ASCII or Unicode.

So HexEdit allowed you to convert when pasting into a different program but this ability was lost when copying and pasting within HexEdit itself.  That is, you could not simply copy and paste within HexEdit to convert from one character set to another since HexEdit first looks for a binary format and pastes it verbatim (ie, with no conversion).

So I added the Paste As ASCII/EBCDIC/Unicode commands to allow pasting and conversion of text.  Many users, especially those working with EBCDIC text found this very useful.

Hex Text

Another deficiency was that sometimes you want to copy the binary data as "hex text".  For example, say you have data of containing an ASCII 'A' character and an ASCII 'B' character and a nul byte.  When copied to the clipboard in Hex Edit this would have been copied as:

binary data: 0x41, 0x42, 0x00  (3 bytes)
text data:   "AB"              (2 bytes)

But sometimes you want the equivalent "hex text":

text data:  "41 42 00"         (8 bytes including spaces)

One use of this is to copy the hex values into a text editor - which some people like to do for some reason.  Another use is to copy an address from the binary data in a file as "hex text" and paste it into the Hex Jump Tool to jump to the address.

This was why I added the "Copy as Hex Text" command to HexEdit.

Paste From Hex Text

Now you would think that the above facilities would cover any possible contingency.  However, some users wanted more.  Some were used to other hex editors which took the simplistic approach of simply copying and pasting all data as "hex text".  Someone complained that since there was a "Copy as Hex Text" command there should be a corresponding "Paste From Hex Text" command.  (There is an "Import from Hex Text" command, which reads hex text from a file but no equivalent to read hex text from the clipboard.)

Apparently several other hex editors do all their clipboard interaction using "hex text".  This allows binary data to be handled without loss and also allowed pasting "hex text" into a text editor.  However, it does have a few problems:

a. It is much slower for large amounts of data due to the conversion time for both copying and pasting.
b. Uses more memory as there are at least 3 bytes stored on the clipboard for every byte of binary data.
c. You can't copy actual text data from the hex editor into a text editor, since it is always converted to "hex text".

However HexEdit Pro 4.0 provides this facility as an option.  (See Workspace/Editing page of the Options dialog.)  Actually, there are now four options:


  1. As HexEdit has always done it (binary data and text data)
  2. Always use "hex text"
  3. If the data appears to be text then use option 1 above, else use option 2.  This allows copying of actual text data so it can be pasted into something else.
  4. When the selection was made in the "character" area (to right of hex area) use option 1 above, else (if selection is in the hex area) use option 2.

I think the above options should cover just about anything, but if you have any feedback on improvements they are most welcome.

Large Selections

However, this is not the only improvement to clipboard use in HexEdit Pro 4.0.  One problem that a small number of users have encountered is when working with very large amounts of data.  The problem is that Windows limits the amount of data that can be placed on the clipboard.  (Under Windows 95 this was 16 MBytes, but under Windows 2000/XP/7 the maximum size varies depending on the amount of RAM in the system but is usually much less than 100 MBytes.)

When placing large amounts of data on the clipboard HexEdit Pro 4.0 uses a custom clipboard format, which uses a temporary file on disk to store the actual data.  All that is placed in memory on the clipboard is the name of the data file.

This facility allows copying and pasting blocks of unlimited size, limited only by available disk space in the user's temporary data area.  This has been tested with a selection of 100 GigaBytes, which is more than 1,000 times bigger than could previously be handled.  Of course, it takes a bit of time to copy this much data around on disk.

Monday 17 October 2011

Bookmarks

If you have ever had to maintain a large, poorly designed piece of software then you know that even a trivial change can take a long time to make.  The problem is not actually making the changes to the code but finding what to do and often most of the time might be actually finding exactly where to make the change.  Personally, I have often spent hours, even days, trying to find something but once found the actual code change might only take a few minutes.  (Note that I am not actually talking about HexEdit here though I admit that the source for it can suffer from this problem as it is poorly designed in places.)

VC++ Named Bookmarks

Back when I first started using the Microsoft C++ compiler (VC++ 4.1, I think) I found the named bookmark system absolutely indispensable for keeping track of important places in the code.  I also used it to keep track of all bugs that I fixed by prefixing the bookmark name with a bug number - that way I could easily find the relevant code if I had a similar or related bug.  I liked it so much that I modelled HexEdit's bookmark system on the Visual Studio system.  (Actually I think the VC++ IDE was called Developer Studio, not Visual Studio, in those days.)

The things that made the bookmark system particularly useful were that bookmarks were persistent, project-wide and you could assign each bookmark a meaningful name.  You needed to know nothing but the name of the bookmark to be able to jump straight to the source code.  You could even jump to a bookmark in a file that was not even open and the IDE would kindly open it for you.

[The worst thing was that the bookmarks were stored in a binary format and you could easily lose all your bookmarks if the file became corrupted, which often happened.  I quickly learnt to backup the bookmarks file regularly but not being text it was not amenable to being placed under version control.]

As I said, I modelled the HexEdit bookmarks on the VC++ 6 (DevStudio) bookmarks system but there were also some improvements...

Improved Dialog

First I made the bookmarks dialog modeless so you could leave it open to always be able to see the list.  This also makes it simple to jump to a bookmark (just by double-clicking it) and it's also much easier to add a new one.

It is just stupid for a dialog that contains a list to not be resizeable, especialy as monitors are always becoming of higher resolution.  I always found the pokey little Dev Studio dialog to be annoying.

Finding Bookmarks

The HexEdit bookmarks dialog supports several columns of information which can be resized and even hidden.  Sorting on a particular column can make it easy find the bookmark you are looking for.

One thing I find annoying is when you can't remember the name of a bookmark.  HexEdit also keeps track of when each bookmark was last modified (created or moved) and when it was last accessed, or jumped to.  This information can be useful to find a bookmark you can't remember the name of.

Fixing Annoyances

It's annoying when you go to jump to a bookmark but the file it points to no longer exists on disk.  HexEdit provides a Validate button so you can check that all the bookmarks are valid.

I also got rid of the silly idea of having completely separate lists of named and unnamed bookmarks.  In HexEdit if you create an "unnamed" bookmark (eg by pressing Ctrl+F2) it adds a bookmark prefixed with "Unnamed" such as "Unnamed001".


One annoyance I encounter when using HexEdit is that bookmarks mysteriously seem to move.  (And no, it is not a bug in HexEdit).  I suspect the most common reason is that the file has been modified outside of HexEdit rendering the bookmark slightly out of place.  As a way around this I recently added a very useful feature -- if you see a bookmark is in the wrong place you can just click and drag it to the right place.

Bookmarks Tool

Before I forget I should mention the bookmarks tool.  This is a drop list on the standard toolbar which shows all the bookmarks in the active file.  As you move around in a file the tool updates to shows the closest bookmark above the cursor.  What I find really convenient is the ability to select a bookmark from the list to quickly jump to it.

Visual Studio Bookmarks

When VS.Net (aka Visual Studio 2002) was released Microsoft rewrote the IDE and accidentally or intentionally removed almost all support for bookmarks.  This was a major step backward from VC++6 so obviously I wrote an email of complaint.  I also made a list of suggestions in the same or another email including most of the facilities in HexEdit:
  • modeless, resizeable, dockable bookmarks window
  • sortable list with multiple columns
  • Ctrl+F2 (unnamed bookmark) should just create a special name and add it to the list of named bookmarks
  • bookmark categories
I don't know if anyone even looked at my emails but since then Visual Studio has added all these features and a few more.  Though the features are good the implementation sucks but discussion of that may have to wait for another post.

Wednesday 28 September 2011

Recent File Lists

Have you ever gone to open a file from your recent file list and find that it has gone?

This has happened to me often (in Word, NotePad++, and other programs) and it is extremely annoying especially if you can't remember where the file is stored.  The trouble is you come to rely on being able to open a special file (or several files) from the list and consequently forget where it is on the disk.  Then it takes a lot of time to go and find it.

For example, just now I went to open an Excel spreadsheet with a table of credit card numbers (test numbers not real ones for those of you who were wondering).  Because I had also been recently working with some test scripts in Excel spreadsheets the list of about 20 files in the recent file list did not include the file I needed anymore.  I searched my local disk and could not find it, and I really didn't have the time to search our large network drive.  (Luckily, I found a copy attached to an old email.)

This is why I find it surprising that I have not seen any other software copy my idea of the "Recent File Dialog" which has been in HexEdit for almost 10 years.  It basically keeps track of all the files you have ever opened (as well as the time you last opened it in HexEdit and other things), until you delete them from the list or clear the list completely.

If anybody knows of other software that has copied HexEdit's approach please let me know!

Windows 7

I guess the above problem is the reason that Windows 7 has added the facility for frequently used file lists.  For software that supports it you can right-click the program icon on the Taskbar and see a list of frequently used files (as well as the familiar recently used file list).  I applaud this improvement but there are additional advantages to the system provided in HexEdit ...

What if you have forgotten the name of the file but remember (or can work out) when you created it, modified it or last opened it.  You can sort the list of files in HexEdit on any column such as time created or modified, size, or the time you last opened it.

You can also use the category, keywords and comments fields (since HexEdit 3.2) to make it easy to find the files you are looking for.  See this forum post for more: http://hexedit.com/cgi-bin/ikonboard.cgi?act=ST;f=6;t=2.

Deleting Entries in the Recent File List

Another annoyance I just noticed with Excel (2007 version) is that you can't delete from the recent file list.  I right-clicked on a file in the list and, though a popup menu appeared, there was no option to remove the file from the list.

You may ask why would anyone want to do this?  There are many reasons:
  • to remove a file from the list that you know you will never open again
  • to remove a file that you know has been moved or deleted
  • to remove a file that you don't want anyone to know you opened
  • to remove unimportant files so that the list is not cluttered
This is easy in the HexEdit Recent File Dialog.  Just select the file or files and click the "Remove" button.

You can also have HexEdit automatically remove files from the list is they no longer exists on disk - for example, if they have been deleted.

Monday 19 September 2011

When is a Leak not a Leak?


I received an email recently pointing out that HexEdit has memory leaks.  If you build and run the debug version of the new HexEdit 4.0 beta then when the program exits the debug heap spits out many messages about memory "leaks".  I have tracked down several of these "leaks" and it turns out they are not really leaks but one-off allocations for singleton objects.  This is not really a leak but better described as a "splash".

I think it is pretty poor of the MS CRT (Microsoft C run-time) to pronounce all non-deallocated memory as "leaks", especially as the CRT itself does one-off allocations (eg for the FILE table) but uses a special flag so that it's own "leaks" are hidden.  Often it is harmless or even desireable to not deallocate one-off objects.  But when the CRT debug heap uses the word "leak" it seems to create panic in some people who should know better.

I am pretty diligent about avoiding memory leaks.  Here I define a leak as a continual loss of memory during the running of the software (or some part or parts of the software) that eventually will result in memory exhaustion.  Typically this occurs when a function allocates an object on the heap but forgets to free it when it returns to the caller, leaving an unused but allocated block of memory.  Every time this function is called more memory is lost.

In modern C++, using auto_ptr and the like, it is fairly easy to avoid accidentally doing this.  Some of HexEdit was written before the advent of auto_ptr though; nevertheless I believe there are no such leaks in HexEdit.

My point is that I am not too bothered about all the messages about leaks from the debug CRT.  At various times I have tracked down many of these and they have always turned out to be harmless splashes.

However, I have worked in teams where the managers insists that all such "leaks" are fixed.   And if you have developers who are prone to introducing leaks then that is probably a good idea.  That way, when you do get a real leak it is not hidden amongst all the other bogus "leaks" in the debug CRT leak report.

On the other hand I have seen software which allocates a lot of global data (caching large amounts of data for performance reasons).  Insisting that all the objects on the heap are freed at program exit can mean that the program takes several minutes to exit.  In this sort of situation it is simpler and faster to simply exit and let the CRT/OS tidy up.

VLD

Anyway, I will endeavour to tidy up all the "leaks" in HexEdit.  Luckilly, there is a library that makes this easy, often trivial, to do.  See http://vld.codeplex.com for more information.

If you have ever used the MS CRT debug heap to try to track down memory leaks you know that it can be extremely tedious, sometimes nigh impossible.  The problem is that, although it tells you where the allocation was made, it tells you nothing of how the flow of control got to that specific allocation.  For example, there could be thousands of CStrings used in a program and there may be just one that was not deallocated.  Sometimes the contents of the memory (eg the text in the string) may give a clue but not always.

The beauty of VLD is it provides a "call stack"  that allows you to very quickly track down the problem piece of code.

Wednesday 3 August 2011

Introduction

I get quite a few emails asking about Hex Edit, such as how or why I added a specific feature.  I hope this blog will help somebody.  It is mainly to discuss topics of interest that arise from my development of Hex Edit and now Hex Edit Pro.  That is, the techniques, tools, libraries etc that I use including benefits and problems of their use.


If you don't know about Hex Edit then it started as a free binary file editor in 1998.  Due to its great popularity and demand for requests I released a shareware version in 2001.  See http://www.hexedit.com for more information.  I didn't make it shareware to make gobs of money (sales would not even cover the cost of the hardware and software I use to make it) but as some motivation to continue development and also to gauge how much it is used and liked.  Also at least one user wanted to give me money to continue development but the company they worked for did not allow them to make a donation though they were quite happy to buy 20 copies of Hex Edit.


I have always released a complete earlier version of Hex Edit for free.  Hex Edit 3.0 was released as a free version in 2009 and is identical to the shareware version of Hex Edit 3.0 released in 2005 (apart from a few bug fixes).


One reason I could not release the source code for versions after 2.0 is that it used the commercial BCG library.  (Actually I did supply the source for several BCG users who requested it.)  Happily Microsoft bought the BCG code and added to MFC 9.0 (more on this in a later post) - so I can now make Hex Edit completely open source.


With the release of Hex Edit 4.0 there will be a free open source version (Hex Edit 4.0) and a shareware version (Hex Edit Pro 4.0) with more features.  However, the free version is by no means incomplete having many features not found in many other hex editors.




Anyway back to the point of this blog ...  The blog is not intended as a discussion of how to use Hex Edit unless it is related to the general discussion.  (For example, I plan to discuss the use of bookmarks in Visual Studio and how Hex Edit bookmarks are better is some ways.)  It will probably be more of interest to programmers (particularly C++ programmers) than Hex Edit users but the intersection of these sets is fairly large.


I welcome feedback and specific questions about how or why something is done in Hex Edit.