Here are a collection of thoughts I have had after making the CSS tokenizer alogrithm functional yesterday:
- The stream could be made asynchronous. This would mean that it can start parsing the CSS stream as it reads it.
- There are some parts of the implementation that don’t align with the wording of the specification as much as I would like. For example, in the main loop where it determines which type of token should be parsed, I believe the specification wording indicates that the current character is already consumed whereas in the implementation, it is the character to be consumed.
- Parsing errors are handled inconsistently. The specificiation clearly describes when they occur but the implementation does not yet do a good job of recording this in a way that can be used for syntax highlighting later.
- The tokenization algorithm described in the specification has slightly different goals to the implementation I have created. This has caused some deviations from the specification. One goal I have is to retain all characters, so the syntax highlighting processing does not cause data to be lost whereas the specification directs certain characters to be ignored. The specification has the goal of interpreting the CSS; e.g. it converts the values of escaped sequences. When parsing for the purpose of syntax highlighting this can be ignored as the original escape sequence will be displayed. I think I could store the spec-conformant values of such tokens as well as have a ‘displayValue’ which contains the original text from which the token value was interpretet.
Overall, I was actually quite surprised, when all is said and done, that the CSS specification for tokenization is actually quite simple. I could follow the rules defined in the documentation and before I knew it, I had a working implementation. Well done to the W3C for their work on understandable technical documentation 👍