Meeting minutes
DICOM
erich: Still working on implementing. The spec is complicated.
Validating FHIR RDF examples
eric: We got validation working, with Yovka and Lia and Claude. Working in Jena. Never heard back from Deepak on how to execute the ShEx generator, but the remaining work is to run the examples, round trip through HAPI implementation through the validator.
ACTION: DBooth to follow up w Deepak on connecting w Eric (and Jim?)
jim: Looks like everything I submittted was merged in March.
… If the shorthand syntax for boolean, int and decimals is in the examples, then they're up to date.
eric: Also copy claude on followup email.
jim: See these examples: https://
Erich's DICOM use case
Erich's slides: https://
erich: Halcyon is full slide imaging. Biopsy, high-res scanning. Tissue samples are often stained to highlight materials.
… Over 100kx100k image size.
… Pathologists mark up parts of it to make ground truth, then use that as training data.
… After training, we want to apply these models to new data.
… Models made from deep learning.
… How to build system to handle all this?
… For me, it's a matter of rebuilding it.
… RDF all the way. It's now bringing in both image data and other kinds of patient data.
… Basic arch uses jena w TDB2, Fuseki, Jetty. Storage based on SOLID. Using IIIF protocol, tiling engine. Zephyr multiview.
… Previously had keycloak, but pulled it out.
… Use SPARQL queries.
… Big images are split into tiles, typically 256x256 images.
… Also using image pyramids for scaling up and down.
… Halcyon uses HTTP range requests.
… a particular area at a particular scale.
… Lots of polygons. I use well-known-text, GeoSparql.
… Want to be able to do query for male smoker with tumor with 10 nm of a specified region.
… I use Hilbert curves.
… Nice property of 2D locality.
… There's also z-curve technique.
… Basically it maps 2D space into 1D space.
… A polygon can be expressed as s series of intersections w hilbert curve.
… Hilbert curves can be extended to n-dimensional space.
… This helps w query performance.
… Using GeoSparql, with prov.
… Annotating w classification based on probability threshold.
… Problem is that the amount of data is growning. One image 100M triples.
… Some devices generate multiple channels, bringing it up over a B triples per image.
… Don't know how I'll handle trillions.
… Everything is being indexed together.
… Maybe move each feature set into its own store.
… Looked at HDT.
… But it represents all literals as string, and lots of my data is numbers.
… I went off on my own working on it.
… BeakGraph is backed by Arrow.
… Halcyon recognizes this as a reasearch object.
… Central jena store has a ref to BeakGraphs, and loads them when needed.
… I've approached limits even with Arrow.
… Now working on a new version, that's HTTP wrapped in HTD.
… This also allows me to focus on the theoretical concepts to get them right.
… to get HDT implemented right.
… HDT allows bit packing, and that helps a lot.
… What about stuffing RDF into DICOM?
… Would like to be able to stuff billions of triples into HDF5 file.
… For UI, I'm using Apache Wicket -- java driven.
… When you pass a java object to the framework, it generates the display.
… I'm adding in jena, making Vandegraph.
… Wicket has an interface that i used to make RDF a first-=class citizen.
… Object passed to Wicket would be a jena graph, or a resource.
… Like a resource "MyURI"
… It might list the propoerties in alphabetical order.
… SHACL comes into play when I want it to display a certian way.
… When the data means a particular something, I can make it display how I want.
… SPARQL result set gets passed to Wicket.
… Trying to set up login to be able to switch it to SOLID.
… After login I can see results of sparql select using jena.
… RDF wrapper around dimensional renderer
… There's a sparql endpoint too.
… Not full compliance w SOLID, but it's the architectural plan.
… It will fire up sparql endpoints as needed when going to beakgraph.
… Federating queries.
… TDB2 is basically a dataset of datasets.
… At the moment jena does those federated queries one by one. I'd rather do them in parallel.
… Found another group also working on similar stuff. Meeting w them tomorrow.
… Josh Moore is the connecting agent.
… They want to stuff zar files into HDT. He's an RDF fan.
… I also developed a 3D graph viewer, to see if I was doing the data correctly.
… We can look at images w multiviewer.
… It talks to the tiling viewer. If you navigate one image, the others move in sync.
… Working well. But it was a dead end, becaues a microscope can focus at different levels. Now we have stacks of whole-slide images representing layers.
… I can zoom in and out fast because it's low res. As you zoom in, you'll ask for higher res, but constraining your field of view. That multiscal pyramid helps w that.
… But the problem is shifting to 3D space.
… Zephyr is new viewer, seeing an image tiltedin 3D.
… So its a mix of multi resolution images, blended.
… Now having 3D I can also use it for 2D.
eric: Can you pan at an angle?
erich: Yes.
erich: Using SNOMED URIs for annotating
… Rendering is controlled by SHACL applied to the display data.
… Also using Dash.
… Still need to do modeling of image and feature stack, DICOM pathology WI support, GeoSparql operaotrs, and code cleanup.
eric: Any interest beyond 3D, like adding time dimension, or different study axis?
erich: I haven't had to deal with more than 3D yet, but keeping it in mind.
… Want to implement composite datatypes.
… and map it to HDF5.
… There's a group that did pyradiomics. Extracted those features from DICOM images.
… They defined the operators well, but we discoverd we can use them for pathology.
… Might make RDF version of it.
… Roadmap: 1. Federated learning; 2. LLM-driven query interface using SHACL-RAG. 3. Alignment of storage w W3C LWS.
(erich does demo)
erich: I like Wicket because I can use both java and JS.
ADJOURNED