Parse, Don’t Validate AKA Some C Safety Tips

esa@discuss.tchncs.de · 2 months ago

Parse, Don’t Validate AKA Some C Safety Tips

esa@discuss.tchncs.de · 2 months ago

I feel I gotta point out it’s a pretty funny example—email comes up so frequently as a thing that you’re recommended to neither parse nor validate, just try to send an email to the address and see if it works. If you need to know that it was received successfully, a link to click is the general method.

But “parse, don’t validate” is still a generally good idea, no matter the example used. :)

thenextguy@lemmy.world · 2 months ago

I don’t see it. I would much prefer to validate early rather than late. The example of ‘other code might validate it differently or not at all’ seems specious. I don’t want invalid information “deep within the bowels of the system”.

esa@discuss.tchncs.de · 2 months ago

Parsing is a way of “validating early”. You either get a successful parse and the program continues working on known-good data with that knowledge encoded in the type system, or you handle incorrect data as soon as it’s encountered.

thenextguy@lemmy.world · 2 months ago

I understand the concept. I just disagree that it’s a good idea.

zygo_histo_morpheus@programming.dev · 2 months ago

Why do you think it’s a bad idea? Both you and OP are in agreement that you should validate early, which seemed to be what your first comment was about. Is it encoding that the data has been validated in the typesystem that you disagree with?

thenextguy@lemmy.world · 2 months ago

I disagree that parsing is validating. For example, you could give me a valid ISO date time string, but I want a shipping date and you gave me something in the past. It parses, but is not valid.

I disagree that validating early is bad because some other part of the code might also validate later and possibly do it differently. Yes, that’s bad, but not a reason to not validate early.

zygo_histo_morpheus@programming.dev · 2 months ago

This article uses the term “parsing” in a non-standard way - it’s not just about transforming text into structured data, it’s about transforming more general data in to more specific data. For example, you could have a function that “parses” valid dates into valid shipping dates, which returns an error if the input date is in the past for instance and returns a valid_shipping_date type. This type would likely be identical to a normal date, but it would carry extra semantic meaning and would help you to leverage the type checker to make sure that this check actually gets performed.

Doing this would arguably be a bit overzealous, maybe it makes more sense to just parse strings into valid dates and merely validate that they also make sense as shipping dates. Still, any validation can be transformed into a “parse” by simply adding extra type-level information to the validation.

Piatro@programming.dev · 2 months ago

I love the argument about c having type safety with the little side-swipe at rust. “AcTuAlLy C does have type safety! You just have to jump through the following 50 hoops to get it!”. I’m an outsider to both C and Rust but it’s still funny.

esa@discuss.tchncs.de · 2 months ago

It is pretty funny that C’s type system can be described pretty differently based on the speaker’s experience. The parable of the Blub language comes to mind.

Colloidal@programming.dev · 2 months ago

People that say that are thinking of strong typed languages instead of type safe languages. There’s a difference. And it looks like you’re on to it.

Magiilaro@feddit.org · 2 months ago

I prefer to do both, a validation check to see if it has the general form of data I expect then parse what got successfully validated.

bitcrafter@programming.dev · 2 months ago

It is crazy to go to all of the extra trouble of dealing with an additional pointer for the email_t type, when it is just a struct that is a simple wrapper around a char * that could be passed around directly; a lot of the code in this example is just for dealing with having to manage the lifetime of the extra email_t allocation, which seems like an unnecessary hoop to jump through.

esa@discuss.tchncs.de · 2 months ago

Isn’t that sort of just the cost of doing business in C? It’s a sparse language, so it falls to the programmer to cobble together more.

I do also think the concrete example of emails should be taken as a stand-in. Errors like swapping a parameter for an email application is likely not very harmful and detected early given the volume of email that exists. But in other, less fault-tolerant applications it becomes a lot more valuable.

bitcrafter@programming.dev · 2 months ago

C supports passing structs around by value, so there was no need to allocate memory for it on the heap.