All you have to do is maintain progress of 3% better every day, and we'll be able to get the text version of wikipedia on a floppy disk by the end of next year.
Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.
I'm starting to seriously debug now. Some recent bug fixes cost me, and yesterday evening, I was at 142 bytes and 42.8% and just gained 5 bytes and 2%. 😀
Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community
I got it to work so far, but I broke the compression ratio. 🙁 The problem is that I'm always using a bit to determine that a space follows. I just need the program to remember the last character of a token and print a space if the first character is also a letter.
Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community
It works better than the latest version of PrintTok2 that honors all or nearly all possible literals but worse than my modification of Z-Machine. BTW, assuming the space only saved 2 bytes.
Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community
I am unhappy to announce that there was an error in my calculations: I was referring to the wrong tokens when tallying the results. 🙁 Now, I'm doing 156 bytes and 37.1%.
Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community
I was totally wrong, and I'm ecstatic! It's working now, and the numbers I'm getting now are 93 bytes compressed and 62.5% compressibility. 😀 I want 65% or better by the end of tomorrow.
Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community
I'm at 84 bytes and 66.2% right now. My modifications include shortening tokens to just as many bits as are needed, a bit after each token to determine if a space follows, compression of tokens and treating certain punctuations as letters/numbers.
Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community
I was wondering if you could do anything with binary chain codes, https://www.tinaja.com/text/chain01.html which are sequences where a string of a given length in them is a unique value at any point, such that bit patterns are not repeated. So you've effectively got the 7bit ascii values of one char in 7bits, 2 in 8 bits, 3 in 9 bits, 4 in in 10 bits etc depending on offset. Then maybe do something like a ETOAINSHRDLU... letter frequency distribution from the center outwards, so displacement from center, or jump to next letter is what you store.
Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.
Umm...I was wrong about my text compression techniques: some bug-fixes cost me big time, and I don't know what I'm doing wrong. 🙁 I'm guessing I'm not handling tokenization well. The following are some code snippets responsible for gathering and sorting tokens, written in C:
The problem seems to be too few tokens, but, when I decrease the condition to "saved," I get more tokens, but the compression ratio suffers. I don't know what I'm doing wrong. 🙁
Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community
Good news! Over the past two hours, I gained about 8.1% compressibility with my variation of Toldo's technique on my text adventure's rooms description, but I need to debug it, but first, I want to buy some more points. 😀
Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community
I'm sorry, but at the time, I was wrong. However, I've been debugging and optimizing. I found that one cause of the numbers I was receiving was skipping every other character, because I was advancing twice instead of once. Then, I was doing very poorly. The main reason was that I was trying to compress the EOS. Right now, the numbers are exceptional. 😀 And it works! 😀 I need to decompress compressed tokens and actually writing the compressed tokens: right now, the figures are calculated with tokens compressed, but the tokens are actually not compressed. If I can get the tokens to decompress and efficiently, I plan to let people try it out and benchmark it and tell me how it stacks up. It's currently for cc65, though, but I plan to target other compilers and text adventure creation systems and actual text files.
Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community
Maybe the numbers I was getting were due to a syntax error in the example test file causing much of one string from compiling correctly, as after the correction, I was doing horrible. 🙁 I've gotten it to work several times, but the compression ratio was horrible. 🙁 While it's doing its job, it's doing it very poorly. 🙁 I believe my problem is with tokenization, as I'm getting way too few tokens. My modifications to Toldo's design is the best, but even it is doing poorly. Following is the code I'm using to collect the tokens in a version not based on Toldo's design: